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SYNTHETIC PEPTIDES AND USES THEREFORE 



FIELD OF THE INVENTION 

THIS INVENTION relates generally to agents for modulating immune responses. 
More particularly, the present invention relates to a synthetic polypeptide comprising a 
5 plurality of different segments of a parent polypeptide, wherein the segments are linked to 
each other such that one or more functions of the parent polypeptide are impeded, 
abrogated or otherwise altered and such that the synthetic polypeptide, when introduced 
into a suitable host, can elicit an immune response against the parent polypeptide. The 
invention also relates to synthetic polynucleotides encoding the synthetic polypeptides and 
10 to synthetic constructs comprising these polynucleotides. The invention further relates to 
the use of the polypeptides and polynucleotides of the invention in compositions for 
modulating immune responses. The invention also extends to methods of using such 
compositions for prophylactic and/or therapeutic purposes. 

Bibliographic details of various publications referred to in this specification are 
15 collected at the end of the description. 

BACKGROUND OF THE INVENTION 

The modem reductionist approach to vaccine and therapy development has been 
pursued for a number of decades and attempts to focus only on those parts of pathogens or 
of cancer proteins which are relevant to the immune system. To date the performance of 
20 this approach has been relatively poor considering the vigorous research carried out and 
the number of effective vaccines and therapies that it has produced. This approach is still 
being actively pursued, however, despite its poor performance because vaccines developed 
using this approach are often extremely safe and because only by completely 
understanding the immune system can new vaccine strategies be developed. 

25 One area that has benefited greatly from research efforts is knowledge about how 

the adaptive immune system operates and more specifically how T and B cells learn to 
recognise specific parts of pathogens and cancers. T cells are mainly involved in cell- 
mediated immunity whereas B cells are involved in the generation of antibody-mediated 
immunity. The two most important types of T cells involved in adaptive cellular immunity 
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are ap CD8 + cytotoxic T lymphocytes (CTL) and CD4 + T helper lymphocytes. CTL are 
important mediators of cellular immunity against many viruses, tumours, some bacteria 
and some parasites because they are able to kill infected cells directly and secrete various 
factors which can have powerful effects on the spread of infectious organisms. CTLs 
5 recognise epitopes derived from foreign intracellular proteins, which are 8-10 amino acids 
long and which are presented by class I major histocompatibility complex (MHC) 
molecules (in humans called human lymphocyte antigens - HLAs) (Jardetzky et dL, 1991; 
Fremont et aL, 1992; Rotzschke et aL, 1990). T helper cells enhance and regulate CTL 
responses and are necessary for the establishment of long-lived memory CTL. They also 

10 inhibit infectious organisms by secreting cytokines such as EFN-y. T helper cells recognise 
epitopes derived mostly from extracellular proteins which are 12-25 amino acids long and 
which are presented by class II MHC molecules (Chicz et aL, 1993; Newcomb et aL, 
1993). B cells, or more specifically the antibodies they secrete, are important mediators in 
the control and clearance of mostly extracellular organisms. Antibodies recognise mainly 

15 conformational determinants on the surface of organisms, for example, although 
sometimes they may recognise short linear determinants. 

Despite significant advances towards understanding how T and linear B cell 
epitopes are processed and presented to the immune system, the full potential of epitope- 
based vaccines has not been fully exploited. The main reason for this is the large number 

20 of different T cell epitopes, which have to be included into such vaccines to cover the 
extreme HLA polymorphism in the human population. The human HLA diversity is one of 
the main reasons why whole pathogen vaccines frequently provide better population 
coverage than subunit or peptide-based vaccine strategies. There is a range of epitope- 
based strategies though which have tried to solve this problem, e.g., peptide blends, peptide 

25 conjugates and polyepitope vaccines (ie comprising strings of multiple epitopes) (Dyall et 
aL, 1995; Thomson et aL, 1996; Thomson et aL, 1998; Thomson et aL, 1998). These 
approaches however will always be sub optimal not only because of the slow pace of 
epitope characterisation but also, because it is virtually impossible for them to cover every 
existing HLA polymorphism in the population. A number of strategies have sought to 

30 avoid both problems by not identifying epitopes and instead incorporating larger amounts 
of sequence information e.g., approaches using whole genes or proteins and approaches 
that mix multiple protein or gene sequences together. The proteins used by these strategies 
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however sometimes still function and therefore can compromise vaccine safety e.g., whole 
cancer proteins. Alternative strategies have tried to improve the safety of vaccines by 
fragmenting the genes and expressing them either separately or as complex mixtures e.g., 
library DNA immunisation or by ligating such fragments back together. These approaches 
5 are still sub-optimal because they are too complex, generate poor levels of immunity, 
cannot guarantee that all proteins no longer function and/or that all fragments are present, 
which compromises substantially complete immunological coverage. 

The lack of a safe and efficient vaccine strategy that can provide substantially 
complete immunological coverage is an important problem, especially when trying to 

10 develop vaccines against rapidly mutating and persistent viruses such as HIV and hepatitis 
C virus, because partial population coverage could allow vaccine-resistant pathogens to re- 
emerge in the future. Human immunodeficiency virus (HIV) is an RNA lentivirus virus 
approximately 9 kb in length, which infects CD4 + T cells, causing T cell decline and AIDS 
typically 3-8 years after infection. It is currently the most serious human viral infection, 

1 5 evidenced by the number of people currently infected with HIV or who have died from 
AIDS, estimated by the World Health Organisation (WHO) and UNAIDS in their AIDS 
epidemic update (December 1999) to be 33.6 and 16.3 million people, respectively. The 
spread of HIV is also now increasing fastest in areas of the world where over half of the 
human population reside, hence an effective vaccine is desperately needed to curb the 

20 spread of this epidemic. Despite the urgency, an effective vaccine for HIV is still some 
way off because of delays in defining the correlates of immune protection, lack of a 
suitable animal model, existence of up to 8 different subtypes of HIV and a high HIV 
mutation rate. 



25 capable of generating neutralising antibody responses that can protect against field isolates 
of HIV. Despite these efforts, it is now clear that the variability, instability and 
inaccessibility of critical determinants on the HIV envelope protein will make it extremely 
difficult and perhaps impossible to develop such a vaccine (Kwong et al. 9 1998). The 
limited ability of antibodies to block HIV infection is also supported by the observation 

30 that development of AIDS correlates primarily with a reduction in CTL responsiveness to 
HIV and not to altered antibody levels (Ogg et al. y 1998). Hence CTL-mediated and not 
antibody-mediated responses appear to be critical for maintaining the asymptomatic state 



A significant amount of research has been carried out to try and develop a vaccine 
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in vivo. There is also some evidence to suggest that pre-existing HIV-specific CTL 
responses can block the establishment of a latent HIV infection. This evidence comes from 
a number of cases where individuals have generated HIV-specific CTL responses without 
becoming infected and appear to be protected from establishing latent HIV infections 
5 despite repeated virus exposure (Rowland-Jones et aL, 1995; Parmiani 1998). Taken 
together, these observations suggest that a vaccine capable of generating a broad range of 
strong CTL responses may be able to stop individuals from becoming latently infected 
with HIV or at least allow infected individuals to remain asymptomatic for life. Virtually 
all of the candidate HIV vaccines developed to date have been derived from subtype B 

10 HIV proteins (western world subtype) whereas the majority of the HIV infections 
worldwide are caused by subtypes A/E or C (E and A are similar except in the envelop 
protein)(referred to as developing world subtypes). Hence existing candidate vaccines may 
not be suitable for the more common HIV subtypes. Recently, there has been some 
evidence that B subtype vaccines may be partially effective against other common HIV 

15 subtypes (Rowland-Jones et al., 1998). Accordingly, the desirability of a vaccine still 
remains, whose effectiveness is substantially complete against all isolates of all strains of 
HIV. 
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SUMMARY OF THE INVENTION 

The present invention is predicated in part on a novel strategy for enhancing the 
efficacy of an immunopotentiating composition. This strategy involves utilising the 
sequence information of a parent polypeptide to produce a synthetic polypeptide that 
5 comprises a plurality of different segments of the parent polypeptide, which are linked 
sequentially together in a different arrangement relative to that of the parent polypeptide. 
As a result of this change in relationship, the sequence of the linked segments in the 
synthetic polypeptide is different to a sequence contained within the parent polypeptide. As 
more fully described hereinafter, the present strategy is used advantageously to cause 
10 significant disruption to the structure and/or function of the parent polypeptide while 
minimising the destruction of potentially useful epitopes encoded by the parent 
polypeptide. 

Thus, in one aspect of the present invention, there is provided a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
15 wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide. 

In one embodiment, the synthetic polypeptide consists essentially of different 
segments of a single parent polypeptide. 

In an alternate embodiment, the synthetic polypeptide consists essentially of 
20 different segments of a plurality of different parent polypeptides. 

Suitably, said segments in said synthetic polypeptide are linked sequentially in a 
different order or arrangement relative to that of corresponding segments in said at least 
one parent polypeptide. 

Preferably, at least one of said segments comprises partial sequence identity or 
25 homology to one or more other said segments. The sequence identity or homology is 
preferably contained at one or both ends of said at least one segment. 

In another aspect, the invention resides in a synthetic polynucleotide encoding the 
synthetic polypeptide as broadly described above. 
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According to yet another aspect, the invention contemplates a synthetic construct 
comprising a said polynucleotide as broadly described above that is operably linked to a 
regulatory polynucleotide. 

In a further aspect of the invention, there is provided a method for producing a 
5 synthetic polynucleotide as broadly described above, comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

10 Preferably, the method further comprises fragmenting the sequence of a respective 

parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in said parent polypeptide sequence. In a preferred 
embodiment of this type, the fragments are randomly linked together. 



15 respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. In a preferred embodiment of this type, 
an amino acid of said parent polypeptide sequence is reverse translated to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. Suitably, an amino acid of said parent polypeptide sequence is reverse translated 

20 to provide a codon which, in the context of adjacent or local sequence elements, has a 
lower propensity of forming an undesirable sequence (e.g., a palindromic sequence or a 
duplicated sequence) that is refractory to the execution of a task (e.g., cloning or 
sequencing). 



25 designing the sequence of a synthetic polypeptide as broadly described above, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 



Suitably, the method further comprises reverse translating the sequence of a 



In another aspect, the invention encompasses a computer program product for 
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- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

In yet another aspect, the invention provides a computer program product for 
5 designing the sequence of a synthetic polynucleotide as broadly described above, 
comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 
fragments; 

10 - code that reverse translates the sequence of a respective fragment to provide a 

nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 

1 5 linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes . 

In still yet another aspect, the invention provides a computer for designing the 
sequence of a synthetic polypeptide as broadly described above, wherein said computer 
comprises: 

20 (a) a machine-readable data storage medium comprising a data storage material 

encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

25 (c) a central-processing unit coupled to said working memory and to said machine- 

readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 
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In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

5 In still yet another aspect, the invention resides in a computer for designing the 

sequence of a synthetic polynucleotide as broadly described above, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 

1 0 sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 

15 synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

In a preferred embodiment, the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
20 reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

25 According to another aspect, the invention contemplates a composition, 

comprising an immunopotentiating agent selected from the group consisting of a synthetic 
polypeptide as broadly described above, a synthetic polynucleotide as broadly described 
above and a synthetic construct as broadly described above, together with a 
pharmaceutically acceptable carrier. 
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The composition may optionally comprise an adjuvant. 

In a further aspect, the invention encompasses a method for modulating an 
immune response, which response is preferably directed against a pathogen or a cancer, 
comprising administering to a patient in need of such treatment an effective amount of an 
5 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
broadly described above, a synthetic polynucleotide as broadly described above and a 
synthetic construct as broadly described above, or a composition as broadly described 
above. 

According to still a further aspect of the invention, there is provided a method for 
10 treatment and/or prophylaxis of a disease or condition, comprising administering to a 
patient in need of such treatment an effective amount of an immunopotentiating agent 
selected from the group consisting of a synthetic polypeptide as broadly described above, a 
synthetic polynucleotide as broadly described above and a synthetic construct as broadly 
described above, or a composition as broadly described above. 

15 The invention also encompasses the use of the synthetic polypeptide, the synthetic 

polynucleotide and the synthetic construct as broadly described above in the study, and 
modulation of immune responses. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagrammatic representation showing the number of people living 
with AIDS in 1998 in various parts of the world and most prevalent HIV clades in these 
regions. Estimates generated by UNAIDS. 

5 Figure 2 is a graphical representation showing trends in the incidence of the 

common HIV clades and estimates for the future. Graph from the International Aids 
Vaccine Initiative (IAVI). 

Figure 3 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV gag [SEQ ID NO: 1] used for the construction of an 
10 embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV gag protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 4 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV pol [SEQ ID NO: 2] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV pol protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 

Figure 5 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vif [SEQ ID NO: 3] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HIV vif protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR98-485. 
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Figure 6 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpr [SEQ ID NO: 4] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV vpr protein from the HIV Molecular Immunology 
5 Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 7 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV tat [SEQ ID NO: 5] used for the construction of an 
10 embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
/ consensus sequences for the HIV tat protein from the HIV Molecular Immunology 

Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

15 Figure 8 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV rev [SEQ ID NO: 6] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV rev protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 

20 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 9 is a diagrammatic representation showing overlapping segments of a 
parent polypeptide sequence for HIV vpu [SEQ ID NO: 7] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
25 consensus sequences for the HIV vpu protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 10 is a diagrammatic representation showing overlapping segments of a 
30 parent polypeptide sequence for HIV env [SEQ ID NO: 8] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
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consensus sequences for the HIV env protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

5 Figure 1 1 is a diagrammatic representation showing overlapping segments of a 

parent polypeptide sequence for HIV nef [SEQ ID NO: 9] used for the construction of an 
embodiment of an HIV Savine. Also shown are the alignments of common HIV clade 
consensus sequences for the HIV nef protein from the HIV Molecular Immunology 
Database 1997, Editors Bette Korber, John Moore, Cristian Brander, Richard Koup, Barton 
10 Haynes and Bruce Walker. Publisher, Los Alamos National Laboratory, Theoretical 
Biology and Biophysics, Los Alamos, New Mexico, Pub LAUR 98-485. 

Figure 12 is a diagrammatic representation depicting the systematic segmentation 
of the designed degenerate consensus sequences for each HIV protein and the reverse 
translation of each segment into a DNA sequence. Also shown is the number of segments 

15 used during random rearrangement and amino acids that were removed. Amino acids 
surrounded by an open square were removed from the design, because degenerate codons 
to cater for the desired amino acid combination required too many degenerate bases to 
comply with the incorporation of degenerate sequence rules outlined in the description of 
the invention herein. Amino acids surrounded by an open circle were removed only in the 

20 segment concerned mainly because they were coded for in an oligonucleotide overlap 
region. Amino acids marked with an asterisk were designed differently in one fragment 
compared to the corresponding overlap region (see tat gene) 

Figure 13 is a diagrammatic representation showing the first and second most 
frequently used codons in mammals used to reverse translate HIV protein segments. Also 
25 shown are all first and second most frequently used degenerate codons for two amino acids 
where only one base is varied. Codons used where more than one base was varied were 
worked out in each case by comparing all the codons for each amino acid. The IUPAC 
codes for degenerate bases are also shown. 

Figure 14 illustrates the construction plan for the HIV Savine showing the 
30 approximate sizes of the subcassettes, cassettes and full-length Savine cDNA and the 
restriction sites involved in joining them together. Also shown are the extra sequences 
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added onto each subcassette during their design and a brief description of how the 
subcassettes, cassettes and full length cDNA were constructed and transferred into 
appropriate DNA plasmids. Description of full length construction: pA was cleaved with 
XhoVSaK and cloned into Xhol arms of the B cassette; pAB was cleaved with Xhol and 
5 cloned into Xhol arms of the C cassette; full length construct is excisable with either 
XbaVBamHl at the 5' end or Bglll at the 3' end. Options for excising cassettes: A) 
XbaVBamHl at the 5' end, BghVXhol at the 3 5 end; B) XbaVBamHl at the 5* end, 
BgHHSali at the 3' end; C) XbaVBamHl at the 5' end, BgHUSall at the 3' end. Cleaving 
plasmid vectors: pDNAVacc is cleavable with XbaVXhol (DNA vaccination); pBCB07 or 
10 pTK7.5 vectors are cleavable with BamHVSafl (Recombinant Vaccinia); pAvipox vector 
pAF09 is cleavable with BamHVSafl (Recombinant Avipox). 

Figure 15 shows the full length DNA (17253 bp) and protein sequence (5742 aas) 
of the HIV Savine construct. Fragment boundaries are shown, together with the position of 
each fragment in each designed HIV protein, fragment number (in brackets), spacer 

15 residues (two -alanine residues) and which fragment the spacer was for (open boxes and 
an-ows). The location of residual restriction site joining sequences corresponding to 
subcassette or cassette boundaries (shaded boxes) are also shown, along with start and stop 
codons, Kozak sequence, the location of the murine influenza virus CTL epitope sequence 
(near the 3' end), important restriction sites at each end and the position of each degenerate 

20 amino acid (indicated by 'X'). 

Figure 16 depicts the layout and position of oligonucleotides in the designed DNA 
sequence for subcassette Al. The sequences which anneal to the short amplification 
oligonucleotides are indicated by hatched boxes and the position of oligonucleotide 
overlap regions are dark shaded. 

25 Figure 17: Panel (a) depicts the stepwise asymmetric PCR of the two halves of 

subcassette Al (lanes 2-5 and 7-9, respectively) and final splicing together by SOEing 
(lane 10). DNA standards in lane 1 are pUC18 digested with Sau3Al. Panel (b) shows the 
stepwise ligation -mediated joining and PCR amplification of each cassette as indicated. 
DNA standards in lane 1 are SPP1 cut with EcoRl. 

30 Figure 18: Panel (a) shows summary of the construction of the DNA vaccine 

plasmids that express one HIV Savine cassette. Panel (b) shows a summary of the 
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construction of the plasmids used for marker rescue recombination to generate Vaccinia 
viruses expressing one HIV Savine cassette. Panel (c) shows a summary of the 
construction of the DN A vaccine plasmids which each express a version of the full-length 
HIV Savine cDNA 

5 Figure 19 shows restimulation of HIV specific polyclonal CTL responses from 

three HIV-infected patients by the HIV Savine constructs. PBMCs from three different 
patients were restimulated for 7 days by infection with Vaccinia virus pools expressing the 
HIV Savine cassettes: Pool 1 included W-AC1 and VV-BC1; Pool 2 included VV-AC2, 
VV-BC2 andW-CC2. The restimulated PBMCs were then mixed with autologous LCLs 
10 (effector to target ratio of 50:1), which were either uninfected or infected with either 
Vaccinia viruses expressing the HIV proteins gag (VV-gag), env (VV-env) or pol (W- 
pol), VV- HIV Savine pools 1 (light bars) or 2 (dark bars) or a control Vaccinia virus (VV- 
Lac) and the amount of 51 Cr released used to determine percent specific lysis. K562 cells 
were used to determine the level of NK cell-mediated killing in their stimulated culture. 

15 Figure 20 is a diagrammatic representation showing CD4+ proliferation of 

PBMCs from HIV-1 infected patients restimulated with either P'ooll or Pool2 of the HIV-1 
Savine. Briefly PBMCs were stained with CFSE and culture for 6 days with or without 
VVs encoding either pooll or pool2 of the HIV-1 Savine. Restimulated Cells were then 
labelled with antibodies and analysed by FACS. 

20 Figure 21 is a graphical representation showing the CTL response in mice 

vaccinated with the HIV Savine. C57BL6 mice were immunised with the HIV-1 Savine 
DNA vaccine comprising the six plasmids described in Figure 18a (100 fig total DNA was 
given as 50 /ig/leg i.m.). One week later Poxviruses (lxl0 7 pfu) comprising Pool 1 of the 
HIV-1 Savine were used to boost the immune responses. Three weeks later splenocytes 

25 from these mice were restimulated with W-Pool 1 or VV-Pool 2 for 5 days and the 
resultant effectors used in a 5! Cr release cytotoxicity assay against targets infected with 
CTRW, VV-pools or VV expressing the natural antigens from HIV-1. 

Figure 22 shows immune responses of HIV Immune Macaques (vaccinated with 
recombinant FPV expressing gag-pol and challenged with HIV-1 2 years prior to 
30 experiment). Monkeys 1 and 2 were immunised once at day 0 with VV Savine pool 1 
(Three VVs which together express the entire HIV Savine ). Monkey 3 was immunised 
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twice with FPV-gag-pol i.e., Day 0 is 3 weeks after first FPV-gag-pol immunisation. A) 
IFN-y detection by ELISPOT of whole blood (0.5 mL, venous blood heparin- 
anticoagulated) stimulated with Aldrithiol-2 inactivated whole HIV-1 (20 hours, 20 
/xg/mL). Plasma samples were then centrifuged (lOOOxg) and assayed in duplicate for 
5 antigen-specific IFN using capture ELISA. B) Flow cytometric detection of HIV-1 specific 
CD69+/CD8+ T cells. Freshly isolated PBMCs were stimulated with inactivated HIV-1 as 
above for 16 hours, washed and labelled with the antibodies. Cells were then analysed 
using a FACScalibur™ flow cytometer and data, analysed using Cell-Quest software. C) 
Flow cytometric detection of HIV-1 specific CD69+/CD4+ T cells carried out as in B). 

10 Figure 23 shows a diagram of a system used to carry out the instructions encoded 

by the storage medium of Figures 28 and 29. 

Figure 24 depicts a flow diagram showing an embodiment of a method for 
designing synthetic polynucleotide and synthetic polypeptides of the invention. 

Figure 25 shows an algorithm, which inter alia utilises the steps of the method 
1 5 shown in Figure 24. 

Figure 26 shows an example of applying the algorithm of Figure 25 to an input 
consensus polyprotein sequence of Hepatitis C la to execute the segmentation of the 
polyprotein sequence, the rearrangement of the segments, the linkage of the rearranged 
segments and the outputting of synthetic polynucleotide and polypeptide sequences for the 
20 preparation of Savines for treating and/or preventing Hepatitis C infection. 

Figure 27 illustrates an example of applying the algorithm of Figure 25 to input 
consensus melanocyte differentiation antigens (gplOO, MART, TRP-1, Tyros, Trp-2, 
MC1R, MTJC1F and MUC1R) and to consensus melanoma specific antigens (BAGE, 
GAGE-1, gp!001n4, MAGE-1, MAGE-3, PRAME, TRP2EN2, NYNSOla, NYNSOlb and 
25 LAGE1) to facilitate segmentation of those sequences, to rearrange the segments, to link 
the rearranged segments and to synthetic polynucleotide and polypeptide sequences for the 
preparation of Savines for treating and/or preventing melanoma. 

Figure 28 shows a cross section of a magnetic storage medium. 

Figure 29 shows a cross section of an optically readable data storage medium. 
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Figure 30 shows six HIV Savine cassette sequences (Al [SEQ ID NO: 393], A2 
[SEQ ID NO: 399], BlfSEQ ID NO: 395], B2 [SEQ ID NO: 401], CI [SEQ ID NO: 397] 
and C2 [SEQ ID NO: 403]). Al, Bl and CI can be joined together using, for example, 
convenient restriction enzyme sites provided at the ends of each cassette to construct an 
5 embodiment of a full length HIV Savine [SEQ ID NO: 405]. A2, B2 and C2 can also be 
joined together to provide another embodiment of a full length HIV Savine with 350 aa 
mutations common in major HIV clades. The cassettes A/B/C can be joined into single 
constructs using specific restriction enzyme sites incorporated after the start codon or 
before the stop codon in the cassettes 
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BRIEF DESCRIPTION OF THE SEQUENCES: SUMMARY TABLE 
TABLE A 















SEQ ID NO: 1 


GAG consensus polypeptide 


499 aa 


SEQ ID NO: 2 


POL consensus polypeptide 


995 aa 


SEQ ID NO: 3 


VIF consensus polypeptide 


192 aa 


SEQ ID NO: 4 


VPR consensus polypeptide 


96 aa 


SEQ ID NO: 5 


TAT consensus polypeptide 


102 aa 


SEQ ID NO: 6 


REV consensus polypeptide 


123 aa 


SEQ ID NO: 7 


VPU consensus polypeptide 


81 aa 


SEQ ED NO: 8 


ENV consensus polypeptide 


651 aa 


SEQ ID NO: 9 


NEF consensus polypeptide 


206 aa 


SEQ ID NO: 10 


GAG segment 1 


90 nts 


SEQ ID NO: 1 1 


Polypeptide encoded by SEQ ID NO: 10 


30 aa 


SEQ ID NO: 12 


GAG segment 2 


90 nts 


SEQ ID NO: 13 


Polypeptide encoded by SEQ ID NO: 12 


30 aa 


SEQ ID NO: 14 


GAG segment 3 


90 nts 


SEQ ID NO: 15 


Polypeptide encoded by SEQ ID NO: 14 


30 aa 


SEQ ED NO: 16 


GAG segment 4 


90 nts 


SEQ ID NO: 17 


Polypeptide encoded by SEQ ID NO: 16 


30 aa 


SEQ ID NO: 18 


GAG segment 5 


90 nts 


SEQ ED NO: 19 


Polypeptide encoded by SEQ ID NO: 1 8 


30 aa 


SEQ ED NO: 20 


GAG segment 6 


90 nts 


SEQ ID NO: 21 


Polypeptide encoded by SEQ ED NO: 20 


30 aa 


SEQ ID NO: 22 


GAG segment 7 


90 nts 
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SEQ ED NO: 23 


Polypeptide encoded by SEQ ID NO: 22 


30 aa 


SEQ ID NO: 24 


GAG segment 8 


90 nts 


SEQ ID NO: 25 


Polypeptide encoded by SEQ ID NO: 24 


30 aa 


SEQ ID NO: 26 


GAG segment 9 


90 nts 


SEQ ID NO: 27 


Polypeptide encoded by SEQ ID NO: 26 


30 aa 


SEQ ID NO: 28 


GAG segment 10 


90 nts 


SEQ ID NO: 29 


Polypeptide encoded by SEQ ID NO: 28 


30 aa 


SEQ ID NO: 30 


GAG segment 1 1 


90 nts 


SEQ ID NO: 31 


Polypeptide encoded by SEQ ID NO: 30 


30 aa 


SEQ ID NO: 32 


GAG segment 12 


90 nts 


SEQ ID NQ: 33 


Polypeptide encoded by SEQ ID NO: 32 


30 aa 


SEQ ID NO: 34 


GAG segment 1 3 


90 nts 


SEQ ID NO: 35 


Polypeptide encoded by SEQ ID NO: 34 


30 aa 


SEQ ID NO: 36 


GAG segment 14 


90 nts 


SEQ ID NO: 37 


Polypeptide encoded by SEQ ID NO: 36 


30 aa 


SEQ ID NO: 38 


GAG segment 1 5 


90 nts 


SEQ ID NO: 39 


Polypeptide encoded by SEQ ID NO: 38 


30 aa 


SEQ ID NO: 40 


GAG segment 16 


90 nts 


SEQ ID NO: 41 


Polypeptide encoded by SEQ ID NO: 40 


30 aa 


SEQ ID NO: 42 


GAG segment 17 


90 nts 


SEQ ID NO: 43 


Polypeptide encoded by SEQ ID NO: 42 


30 aa 


SEQ ID NO: 44 


GAG segment 18 


90 nts 


SEQ ID NO: 45 


Polypeptide encoded by SEQ ID NO: 44 


30 aa 


SEQ ED NO: 46 


GAG segment 19 


90 nts 



WO 01/090197 PCT/AU01/00622 

- 19- 



/ 



iMJMBIEM 




RJEMfflM 


SEQ ID NO: 47 


Polypeptide encoded by SEQ ID NO: 46 


30 aa 


SEQ ID NO: 48 


GAG segment 20 


90 nts 


SEQ ID NO: 49 


Polypeptide encoded by SEQ ID NO: 48 


30 aa 


SEQ ID NO: 50 


GAG segment 21 


90 nts 


SEQ ID NO: 51 


Polypeptide encoded by SEQ ID NO: 50 


30 aa 


SEQ ID NO: 52 


GAG segment 22 


90 nts 


SEQ ID NO: 53 


Polypeptide encoded by SEQ ID NO: 52 


30 aa 


SEQ ID NO: 54 


GAG segment 23 


90 nts 


SEQ ID NO: 55 


Polypeptide encoded by SEQ ID NO: 54 


30 aa 


SEQ ID NO: 56 


GAG segment 24 


90 nts 


SEQ ID NO. 57 


Polypeptide encoded by SEQ ID NO: 56 


30 aa 


SEQ JDD NO: 58 


GAG segment 25 


90 nts 


SEQ ED NO: 59 


Polypeptide encoded by SEQ ID NO: 58 


30 aa 


SEQ ED NO: 60 


GAG segment 26 


90 nts 


SEQ ED NO: 61 


Polypeptide encoded by SEQ ID NO: 60 


30 aa 


SEQ ED NO: 62 


GAG segment 27 


90 nts 


SEQ ID NO: 63 


Polypeptide encoded by SEQ ID NO: 62 


30 aa 


SEQ ID NO: 64 


GAG segment 28 


90 nts 


SEQ ID NO: 65 


Polypeptide encoded by SEQ ID NO: 64 


30 aa 


SEQ ID NO: 66 


GAG segment 29 


90 nts 


SEQ ID NO: 67 


Polypeptide encoded by SEQ ID NO: 66 


30 aa 


SEQ DD NO: 68 


GAG segment 30 


90 nts 


SEQ ID NO: 69 


Polypeptide encoded by SEQ ID NO: 68 


30 aa 


SEQ ED NO: 70 


GAG segment 3 1 


90 nts 
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SEQIDNO:71 


Polypeptide encoded by SEQ ID NO: 70 


30 aa 


SEQ ID NO: 72 


GAG segment 32 


90 nts 


SEQ ID NO: 73 


Polypeptide encoded by SEQ ID NO: 72 


30 aa 


SEQ ID NO: 74 


GAG segment 33 


57 nts 


SEQ ID NO: 75 


Polypeptide encoded by SEQ ED NO: 74 


19 aa 


SEQ ID NO: 76 


POL segment 1 


90 nts 


SEQ ID NO: 77 


Polypeptide encoded by SEQ ED NO: 76 


30 aa 


SEQ ID NO: 78 


POL segment 2 


90 nts 


SEQ ID NO: 79 


Polypeptide encoded by SEQ ID NO: 78 


30 aa 


SEQ ID NO: 80 


POL segment 3 


90 nts 


SEQ ED NO: 81 


Polypeptide encoded by SEQ ED NO: 80 


30 aa 


SEQ ID NO: 82 


POL segment 4 


90 nts 


SEQ ID NO: 83 


Polypeptide encoded by SEQ ED NO: 82 


30 aa 


SEQ ID NO: 84 


POL segment 5 


90 nts 


SEQ ID NO: 85 


Polypeptide encoded by SEQ ED NO: 84 


30 aa 


SEQ ID NO: 86 


POL segment 6 


90 nts 


SEQ ID NO: 87 


Polypeptide encoded by SEQ ID NO: 86 


30 aa 


SEQ ID NO: 88 


POL segment 7 


90 nts 


SEQ ID NO: 89 


Polypeptide encoded by SEQ ED NO: 88 


30 aa 


SEQ ID NO: 90 


POL segment 8 


90 nts 


SEQ ID NO: 91 


Polypeptide encoded by SEQ ID NO: 90 


30 aa 


SEQ ID NO: 92 


POL segment 9 


90 nts 


SEQ ED NO: 93 


Polypeptide encoded by SEQ ID NO: 92 


30 aa 


SEQ ID NO: 94 


POL segment 10 


90 nts 
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SEQ ID NO: 95 
SEQ ID NO: 96 
SEQ ID NO: 97 
SEQ ID NO: 98 
SEQ ID NO: 99 
SEQ ID NO: 100 
SEQ ID NO: 101 
SEQ ID NO: 102 
SEQ ID NO: 103 
SEQ ID NO: 104 
SEQ ID NO: 105 
SEQ ID NO: 106 
SEQ ID NO: 107 
SEQ ID NO: 108 
SEQ ID NO: 109 
SEQ ID NO: 110 
SEQ ID NO: 111 
SEQ ID NO: 112 
SEQ ID NO: 113 
SEQ ID NO: 114 
SEQ ID NO: 115 
SEQ ED NO: 116 
SEQ ID NO: 117 
SEQ ID NO: 118 



Polypeptide encoded by SEQ ED NO: 94 
POL segment 1 1 

Polypeptide encoded by SEQ ID NO: 96 
POL segment 12 

Polypeptide encoded by SEQ ID NO: 98 
POL segment 1 3 

Polypeptide encoded by SEQ ID NO: 100 
POL segment 1 4 

Polypeptide encoded by SEQ ED NO: 102 
POL segment 15 

Polypeptide encoded by SEQ ED NO: 1 04 
POL segment 16 

Polypeptide encoded by SEQ ID NO: 106 
POL segment 1 7 

Polypeptide encoded by SEQ ID NO: 108 
POL segment 18 

Polypeptide encoded by SEQ ID NO: 1 10 
POL segment 19 

Polypeptide encoded by SEQ ED NO: 112 
POL segment 20 

Polypeptide encoded by SEQ ID NO: 114 
POL segment 21 

Polypeptide encoded by SEQ ID NO: 116 
POL segment 22 



30 aa 
90nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
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SEQ ID NO: 119 


Polypeptide encoded by SEQ ID NO: 118 


30 aa 


SEQIDNO: 120 


POL segment 23 


90 nts 


SEQ ID NO: 121 


Polypeptide encoded by SEQ ID NO: 120 


30 aa 


SEQIDNO: 122 


POL segment 24 


90 nts 


SEQIDNO: 123 


Polypeptide encoded by SEQ ID NO: 122 


30 aa 


SEQIDNO: 124 


POL segment 25 


90 nts 


SEQIDNO: 125 


Polypeptide encoded by SEQ ID NO: 124 


30 aa 


SEQIDNO: 126 


POL segment 26 


90 nts 


SEQIDNO: 127 


Polypeptide encoded by SEQ ID NO: 126 


30 aa 


SEQIDNO: 128 


POL segment 27 


90 nts 


SEQ ID NO: 129 


Polypeptide encoded by SEQ ID NO: 128 


30 aa 


SEQIDNO: 130 


POL segment 28 


90 nts 


SEQIDNO: 131 


Polypeptide encoded by SEQ ID NO: 130 


30 aa 


SEQIDNO: 132 


POL segment 29 


90 nts 


SEQIDNO: 133 


Polypeptide encoded by SEQ ID NO: 132 


30 aa 


SEQIDNO: 134 


POL segment 30 


90 nts 


SEQIDNO: 135 


Polypeptide encoded by SEQ ID NO: 1 34 


30 aa 


SEQIDNO: 136 


POL segment 31 


90 nts 


SEQIDNO: 137 


Polypeptide encoded by SEQ ID NO: 136 


30 aa 


SEQIDNO: 138 


POL segment 32 


90 nts 


SEQIDNO: 139 


Polypeptide encoded by SEQ ID NO: 1 38 


30 aa 


SEQ ID NO: 140 


POL segment 33 


90 nts 


SEQIDNO: 141 


Polypeptide encoded by SEQ ED NO: 140 


30 aa 


SEQIDNO: 142 


POL segment 34 


90 nts 
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SEQ ID NO: 143 
SEQ ID NO: 144 
SEQIDNO: 145 
SEQ ID NO: 146 
SEQIDNO: 147 
SEQIDNO: 148 
SEQIDNO: 149 
SEQIDNO: 150 
SEQIDNO: 151 
SEQIDNO: 152 
SEQIDNO: 153 
SEQIDNO: 154 
SEQIDNO: 155 
SEQIDNO: 156 
SEQIDNO: 157 
SEQIDNO: 158 
SEQIDNO: 159 
SEQIDNO: 160 
SEQIDNO: 161 
SEQIDNO: 162 
SEQIDNO: 163 
SEQIDNO: 164 
SEQIDNO: 165 
SEQIDNO: 166 



Polypeptide encoded by SEQ ID NO: 142 
POL segment 35 

Polypeptide encoded by SEQ ID NO: 144 
POL segment 36 

Polypeptide encoded by SEQ ID NO: 146 
POL segment 37 

Polypeptide encoded by SEQ ID NO: 148 
POL segment 38 

Polypeptide encoded by SEQ ID NO: 1 50 
POL segment 39 

Polypeptide encoded by SEQ ID NO: 152 
POL segment 40 

Polypeptide encoded by SEQ ID NO: 154 
POL segment 41 

Polypeptide encoded by SEQ ID NO: 1 56 
POL segment 42 

Polypeptide encoded by SEQ ID NO: 158 
POL segment 43 

Polypeptide encoded by SEQ ID NO: 1 60 
POL segment 44 

Polypeptide encoded by SEQ ID NO: 1 62 
POL segment 45 

Polypeptide encoded by SEQ ID NO: 164 
POL segment 46 



30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
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SEQ ID NO: 167 


Polypeptide encoded by SEQ ID NO: 166 


30 aa 


SEQIDNO: 168 


POL segment 47 


90 nts 


SEQ ID NO: 169 


Polypeptide encoded by SEQ ED NO: 168 


30 aa 


SEQ ID NO: 170 


POL segment 48 


90 nts 


SEQIDNO: 171 


Polypeptide encoded by SEQ ID NO: 1 70 


30 aa 


SEQIDNO: 172 


POL segment 49 


90 nts 


SEQIDNO: 173 


Polypeptide encoded by SEQ ID NO: 172 


30 aa 


SEQIDNO: 174 


POL segment 50 


90 nts 


SEQIDNO: 175 


Polypeptide encoded by SEQ ID NO: 174 


30 aa 


SEQIDNO: 176 


POL segment 5 1 


90 nts 


SEQIDNO: 177 


Polypeptide encoded by SEQ ID NO: 176 


30 aa 


SEQIDNO: 178 


POL segment 52 


90 nts 


SEQIDNO: 179 


Polypeptide encoded by SEQ ID NO: 1 78 


30 aa 


SEQ ID NO: 1 80 


POL segment 53 


90 nts 


SEQIDNO: 181 


Polypeptide encoded by SEQ ID NO: 1 80 


30 aa 


SEQ ID NO: 182 


POL segment 54 


90 nts 


SEQIDNO: 183 


Polypeptide encoded by SEQ ID NO: 1 82 


30 aa 


SEQIDNO: 184 


POL segment 55 


90 nts 


SEQIDNO: 185 


Polypeptide encoded by SEQ ID NO: 1 84 


30 aa 


SEQIDNO: 186 


POL segment 56 


90 nts 


SEQIDNO: 187 


Polypeptide encoded by SEQ ID NO: 1 86 


30 aa 


SEQIDNO: 188 


POL segment 57 


90 nts 


SEQ ID NO: 1 89 


Polypeptide encoded by SEQ ID NO: 1 88 


30 aa 


SEQIDNO: 190 


POL segment 58 


90 nts 
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Hi 


SEQ ID NO: 191 


Polypeptide encoded by SEQ ID NO: 190 


30 aa 


SEQ ID NO: 192 


POL segment 59 


90 nts 


SEQ ID NO: 193 


Polypeptide encoded by SEQ ID NO: 192 


30 aa 


SEQ ID NO: 194 


POL segment 60 


90 nts 


SEQ ID NO: 195 


Polypeptide encoded by SEQ ID NO: 1 94 


30 aa 


SEQ ID NO: 196 


POL segment 61 


90 nts 


SEQ ID NO: 197 


Polypeptide encoded by SEQ ID NO: 196 


30 aa 


SEQ ID NO: 198 


POL segment 62 


90 nts 


SEQ ID NO: 199 


Polypeptide encoded by SEQ ID NO: 198 


30 aa 


SEQ ID NO: 200 


POL segment 63 


90 nts 


SEQ ID NO: 201 


Polypeptide encoded by SEQ ID NO: 200 


30 aa 


SEQ ID NO: 202 


POL segment 64 


90 nts 


SEQ ID NO: 203 


Polypeptide encoded by SEQ ID NO: 202 


30 aa 


SEQ ED NO: 204 


POL segment 65 


90 nts 


SEQ ID NO: 205 


Polypeptide encoded by SEQ ID NO: 204 


30 aa 


SEQ ED NO: 206 


POL segment 66 


60 nts 


SEQ ID NO: 207 


Polypeptide encoded by SEQ ID NO: 206 


20 aa 


SEQ ID NO: 208 


VIF segment 1 


90 nts 


SEQ ID NO: 209 


Polypeptide encoded by SEQ ID NO: 208 


30 aa 


SEQ ED NO: 210 


VIF segment 2 


90 nts 


SEQ ED NO: 21 1 


Polypeptide encoded by SEQ ID NO: 210 


30 aa 


SEQ ID NO: 212 


VIF segment 3 


90 nts 


SEQ ID NO: 213 


Polypeptide encoded by SEQ ID NO: 212 


30 aa 


SEQ ID NO: 214 


VIF segment 4 


90 nts 
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SEQ ID NO: 215 


Polypeptide encoded by SEQ ED NO: 214 


30 aa 


SEQIDNO:216 


VIF segment 5 


90 nts 


SEQ ID NO: 217 


Polypeptide encoded by SEQ ID NO: 216 


30 aa 


SEQ ID NO: 218 


VIF segment 6 


90 nts 


SEQ ID NO: 219 


Polypeptide encoded by SEQ ID NO: 218 


30 aa 


SEQ ID NO: 220 


VIF segment 7 ! 


90 nts 


SEQ ID NO: 221 


Polypeptide encoded by SEQ ID NO: 220 


30 aa 


SEQ ID NO: 222 


VIF segment 8 


90 nts 


SEQ ID NO: 223 


Polypeptide encoded by SEQ ID NO: 222 


30 aa 


SEQ ID NO: 224 


VIF segment 9 


90 nts 


SEQ ID NO: 225 


Polypeptide encoded by SEQ ID NO: 224 


30 aa 


SEQ ID NO: 226 


VIF segment 10 


90 nts 


SEQ ID NO: 227 


Polypeptide encoded by SEQ ID NO: 226 


30 aa 


SEQ ID NO: 228 


VIF segment 1 1 


90 nts 


SEQ ID NO: 229 


Polypeptide encoded by SEQ ID NO: 228 


30 aa 


SEQ ID NO: 230 


VIF segment 12 


81 nts 


SEQ ED NO: 231 


Polypeptide encoded by SEQ ID NO: 230 


27 aa 


SEQ ID NO: 232 


VPR segment 1 


90 nts 


SEQ ID NO: 233 


Pnlvnentide encoded bv SEO ED NO" 232 


30 aa 


SEQ ID NO: 234 


VPR segment 2 


90 nts 


SEQ ED NO: 235 


Polypeptide encoded by SEQ ID NO: 234 


30 aa 


SEQ ID NO: 236 


VPR segment 3 


90 nts 


SEQ ID NO: 237 


Polypeptide encoded by SEQ ID NO: 236 


30 aa 


SEQ ID NO: 238 


VPR segment 4 


90 nts 
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SEQ ID NO: 239 


Polypeptide encoded by SEQ ID NO: 238 


30 aa 


SEQ ID NO: 240 


VPR segment 5 


90 nts 


SEQ ID NO: 241 


Polypeptide encoded by SEQ ID NO: 240 


30 aa 


SEQ ID NO: 242 


VPR segment 6 


63 nts 


SEQ ID NO: 243 


Polypeptide encoded by SEQ ID NO: 242 


21 aa 


SEQ ID NO: 244 


TAT segment 1 


90 nts 


SEQ ID NO: 245 


Polypeptide encoded by SEQ ID NO: 244 


30 aa 


SEQ ID NO: 246 


TAT segment 2 


90 nts 


SEQ ID NO: 247 


Polypeptide encoded by SEQ ID NO: 246 


30 aa 


SEQ ID NO: 248 


TAT segment 3 


90 nts 


SEQ ID NO: 249 


Polypeptide encoded by SEQ ID NO: 248 


30 aa 


SEQ ID NO: 250 


TAT segment 4 


90 nts 


SEQ ID NO: 251 


Polypeptide encoded by SEQ ID NO: 250 


30 aa 


SEQ ID NO: 252 


TAT segment 5 


90 nts 


SEQ ID NO: 253 


Polypeptide encoded by SEQ ID NO: 252 


30 aa 


SEQ ID NO: 254 


TAT segment 6 


81 nts 


SEQ ID NO: 255 


Polypeptide encoded by SEQ ID NO: 254 


27 aa 


SEQ ID NO: 256 


REV segment 1 


90 nts 


SEQ ID NO: 257 


Polypeptide encoded by SEQ ID NO: 256 


30 aa 


SEQ ID NO: 258 


REV segment 2 


90 nts 


SEQ ID NO: 259 


Polypeptide encoded by SEQ ID NO: 258 


30 aa 


SEQ ID NO: 260 


REV segment 3 


90 nts 


SEQ ID NO: 261 


Polypeptide encoded by SEQ ID NO: 260 


30 aa 


SEQ ID NO: 262 


REV segment 4 


90 nts 
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SEQ ID NO: 263 


Polypeptide encoded by SEQ ID NO: 262 


30 aa 


SEQ ID NO: 264 


REV segment 5 


90 nts 


SEQ ID NO: 265 


Polypeptide encoded by SEQ ID NO: 264 


30 aa 


SEQ ID NO: 266 


REV segment 6 


90 nts 


SEQ ID NO: 267 


Polypeptide encoded by SEQ ID NO: 266 


30 aa 


SEQ ID NO: 268 


REV segment 7 


90 nts 


SEQ ID NO: 269 


Polypeptide encoded by SEQ ID NO: 268 


30 aa 


SEQ ID NO: 270 


REV segment 8 


54 nts 


SEQ ID NO: 271 


Polypeptide encoded by SEQ ID NO: 270 


18 aa 


SEQ ID NO: 272 


VPU segment 1 


90 nts 


SEQ ID NO: 273 


Polypeptide encoded by SEQ ID NO: 272 


30 aa 


SEQ ID NO: 274 


VPU segment 2 


90 nts 


SEQ ID NO: 275 


Polypeptide encoded by SEQ ID NO: 274 


30 aa 


SEQ ID NO: 276 


VPU segment 3 


90 nts 


SEQ ID NO: 277 


Polypeptide encoded by SEQ ID NO: 276 


30 aa 


SEQ ID NO: 278 


VPU segment 4 


90 nts 


SEQ ED NO: 279 


Polypeptide encoded by SEQ ID NO: 278 


30 aa 


SEQ ID NO: 280 


VPU segment 5 


63 nts 


SEQ ID NO: 281 


Polypeptide encoded by SEQ ID NO: 280 


21 aa 


SEQ ID NO: 282 


ENV segment 1 


90 nts 


SEQ ID NO: 283 


Polypeptide encoded by SEQ ID NO: 282 


30 aa 


SEQ ID NO: 284 


ENV segment 2 


90 nts 


SEQ ID NO: 285 


Polypeptide encoded by SEQ ID NO: 284 


30 aa 


SEQ ED NO: 286 


ENV segment 3 


90 nts 
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SEQ ED NO: 287 


Polypeptide encoded by SEQ ID NO: 286 


30 aa j 


SEQ ID NO: 288 


ENV segment 4 


90 nts 


SEQ ID NO: 289 


Polypeptide encoded by SEQ ID NO: 288 


30 aa 


SEQ ID NO: 290 


ENV segment 5 


90 nts 


SEQ ID NO: 291 


Polypeptide encoded by SEQ ID NO: 290 


30 aa 


SEQ ID NO: 292 


ENV segment 6 


90 nts 


SEQ ID NO: 293 


Polypeptide encoded by SEQ ID NO: 292 


30 aa 


SEQ ID NO: 294 


ENV segment 7 


90 nts 


SEQ ID NO: 295 


Polypeptide encoded by SEQ ID NO: 294 


30 aa 


SEQ ID NO: 296 


ENV segment 8 


90 nts 


SEQ ID NO: 297 


Polypeptide encoded by SEQ ID NO: 296 


30 aa 


SEQ ID NO: 298 


ENV segment 9 


57 nts 


SEQ ID NO: 299 


Polypeptide encoded by SEQ ID NO: 298 


19 aa 


SEQ ID NO: 300 


GAP A segment 1 


90 nts 


SEQ ID NO: 301 


Polypeptide encoded by SEQ ED NO: 300 


30 aa 


SEQ ID NO: 302 


GAP A segment 2 


90 nts 


SEQ ID NO: 303 


Polypeptide encoded by SEQ ID NO: 302 


30 aa 


SEQ ID NO: 304 


GAP A segment 3 


90 nts 


SEQ ID NO: 305 


Polypeptide encoded by SEQ ID NO: 304 


30 aa 


SEQ ID NO: 306 


GAP A segment 4 


90 nts 


SEQ ID NO: 307 


Polypeptide encoded by SEQ ID NO: 306 


30 aa 


SEQ ID NO: 308 


GAP A segment 5 


90 nts 


SEQ ID NO: 309 


Polypeptide encoded by SEQ ID NO: 308 


30 aa 


SEQ ID NO: 310 


GAP A segment 6 


90 nts 
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SEQIDNO:311 


Polypeptide encoded by SEQ ID NO: 310 


30 aa 


SEQIDNO:312 


GAP A segment 7 


75 nts 


SEQIDNO: 313 


Polypeptide encoded by SEQ ID NO: 312 


25 nts 


SEQIDNO: 314 


GAP B segment 1 


90 nts 


SEQIDNO: 315 


Polypeptide encoded by SEQ ED NO: 314 


30 aa 


SEQIDNO: 316 


GAP B segment 2 


90 nts 


SEQIDNO: 317 


Polypeptide encoded by SEQ ID NO: 316 


30 aa 


SEQIDNO: 318 


GAP B segment 3 


90 nts 


SEQIDNO: 319 


Polypeptide encoded by SEQ ID NO: 318 


30 aa 


SEQ ID NO: 320 


GAP B segment 4 


90 nts 


SEQIDNO: 321 


Polypeptide encoded by SEQ ID NO: 320 


30 aa 


SEQ ED NO: 322 


GAP B segment 5 


90 nts 


SEQ ID NO: 323 


Polypeptide encoded by SEQ ID NO: 322 


30 aa 


SEQ ID NO: 324 


GAP B segment 6 


90 nts 


SEQ ID NO. 325 


Polypeptide encoded by SEQ ID NO: 324 


30 aa 


SEQ ID NO: 326 


GAP B segment 7 


90 nts 


SEQ ID NO: 327 


Polypeptide encoded by SEQ ID NO: 326 


30 aa 


SEQ ID NO: 328 


GAP B segment 8 


90 nts 


SEQ ID NO: 329 


Polypeptide encoded by SEQ ID NO: 328 


30 aa 


SEQ ID NO: 330 


GAP B segment 9 


90 nts j 


SEQIDNO: 331 


Polypeptide encoded by SEQ ID NO: 330 


30 aa 


SEQ ID NO: 332 


GAP B segment 10 


90 nts 


SEQ ID NO: 333 


Polypeptide encoded by SEQ ID NO: 332 


30 aa 


SEQ ID NO: 334 


GAP B segment 1 1 


90 nts 
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SEQ ID NO: 335 


Polypeptide encoded by SEQ ID NO: 334 


30 aa 


SEQ ID NO: 336 


GAP B segment 12 


90 nts 


SEQ ID NO: 337 


Polypeptide encoded by SEQ ID NO: 336 


30 aa 


SEQ ID NO: 338 


GAP B segment 13 


90 nts 


SEQ ID NO: 339 


Polypeptide encoded by SEQ ID NO: 338 


30 aa 


SEQ ID NO: 340 


GAP B segment 14 


90 nts 


SEQ ID NO: 341 


Polypeptide encoded by SEQ ED NO: 340 


30 aa 


SEQ ID NO: 342 


GAP B segment 1 5 


90 nts 


SEQ ID NO: 343 


Polypeptide encoded by SEQ ID NO: 342 


30 aa 


SEQ ID NO: 344 


GAP B segment 16 


90 nts 


SEQ ID NO: 345 


Polypeptide encoded by SEQ ID NO: 344 


30 aa 


SEQ ID NO: 346 


GAP B segment 17 


90 nts 


SEQ ID NO: 347 


Polypeptide encoded by SEQ ID NO: 346 


30 aa 


SEQ ID NO: 348 


GAP B segment 1 8 


90 nts 


SEQ ID NO: 349 


Polypeptide encoded by SEQ ID NO: 348 


30 aa 


SEQ ID NO: 350 


GAP B segment 19 


90 nts 


SEQ ID NO: 351 


Polypeptide encoded by SEQ ED NO: 350 


30 aa 


SEQ ID NO: 352 


GAP B segment 20 


90 nts 


SEQ ID NO: 353 


Polypeptide encoded by SEQ ID NO: 352 


30 aa 


SEQ ID NO: 354 


GAP B segment 21 


90 nts 


SEQ ID NO: 355 


Polypeptide encoded by SEQ ID NO: 354 


30 aa 


SEQ ID NO: 356 


GAP B segment 22 


90 nts 


SEQ ED NO: 357 


Polypeptide encoded by SEQ ED NO: 356 


30 aa 


SEQ ID NO: 358 


GAP B segment 23 


90 nts 
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SEQ ID NO: 359 


Polypeptide encoded by SEQ ID NO: 358 


30 aa 


SEQ ID NO: 360 


GAP B segment 24 


90 nts 


SEQ ID NO: 361 


Polypeptide encoded by SEQ ID NO: 360 


30 aa 


SEQ ID NO: 362 


GAP B segment 25 


90 nts 


SEQ ID NO: 363 


Polypeptide encoded by SEQ ID NO: 362 


30 aa 


SEQ ID NO: 364 


GAP B segment 26 


66 nts 


SEQ ID NO: 365 


Polypeptide encoded by SEQ ID NO: 364 


22 aa 


SEQ ID NO: 366 


NEF segment 1 


90 nts 


SEQ ID NO: 367 


Polypeptide encoded by SEQ ID NO: 366 


30 aa 


SEQ ID NO: 368 


NEF segment 2 


90 nts 


1 

SEQ ID NO: 369 


Polypeptide encoded by SEQ ID NO: 368 


30 aa 


SEQ ID NO: 370 


NEF segment 3 


90 nts 


SEQ ID NO: 371 


Polypeptide encoded by SEQ ID NO: 370 


30 aa 


SEQ ID NO: 372 


NEF segment 4 


90 nts 


SEQ ED NO: 373 


Polypeptide encoded by SEQ ID NO: 372 


30 aa 


SEQ ID NO: 374 


NEF segment 5 


90 nts 


SEQ ID NO: 375 


Polypeptide encoded by SEQ ID NO: 374 


30 aa 


SEQ ID NO: 376 


NEF segment 6 


90 nts 


SEQ ID NO: 377 


Polypeptide encoded by SEQ ID NO: 376 


30 aa 


SEQ ID NO: 378 


NEF segment 7 


90 nts 


SEQ ID NO: 379 


Polypeptide encoded by SEQ ID NO: 378 


30 aa 


SEQ ID NO: 380 


NEF segment 8 


90 nts 


SEQ ID NO: 381 


Polypeptide encoded by SEQ ID NO: 380 


30 aa 


SEQ ID NO: 382 


NEF segment 9 


90 nts 
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SEQ ID NO: 383 


Polypeptide encoded by SEQ ID NO: 382 


30 aa 


SEQ ID NO: 384 


NEF segment 10 


90 nts 


SEQ ID NO: 385 


Polypeptide encoded by SEQ ID NO: 384 


30 aa 


SEQ ID NO: 386 


NEF segment 1 1 


90 nts 


SEQ ID NO: 387 


Polypeptide encoded by SEQ ID NO: 386 


30 aa 


SEQ ID NO: 388 


NEF segment 1 2 


90 nts 


SEQ ID NO: 389 


Polypeptide encoded by SEQ ID NO: 388 


30 aa 


SEQ ID NO: 390 


NEF segment 1 3 


78 nts 


SEQ ID NO: 391 


Polypeptide encoded by SEQ ID NO: 390 


26 aa 


SEQ ID NO: 392 


HIV Cassette A 1 


5703 nts 


SEQ ID NO: 393 


Polypeptide encoded by SEQ ID NO:392 


1 896 aa 


SEQ ED NO: 394 


HIV Cassette Bl 


5685 nts 


SEQ ID NO: 395 


Polypeptide encoded by SEQ ID NO: 394 


1890 aa 


SEQ ID NO: 396 


HIV Cassette CI 


5925 nts 


SEQ ID NO: 397 


Polypeptide encoded by SEQ ID NO: 396 


1967 aa 


SEQ ID NO: 398 


HIV Cassette A2 


5703 nts 


SEQ ID NO: 399 


Polypeptide encoded by SEQ 3D NO: 398 


1 896 aa 


SEQ ID NO: 400 


HIV Cassette B2 


5685 nts 


SEQ ID NO: 401 


Polypeptide encoded by SEQ ED NO: 400 


1 890 aa 


SEQ ID NO: 402 


HIV Cassette C2 


5925 nts 


SEQ ID NO: 403 


Polypeptide encoded by SEQ ID NO: 402 


1967 aa 


SEQ ID NO: 404 


HTV complete Savine 


17244 nts 


SEQ ID NO: 405 


Polypeptide encoded by SEQ ID NO: 404 


5747 aa 


SEQ ID NO: 406 


HepCla consensus polyprotein sequence 


3011 aa 
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SEQ ID NO: 407 


HepCla segment 1 


90 nts 


SEQ ID NO: 408 


Polypeptide encoded by SEQ ID NO: 407 


30 aa 


SEQ ID NO: 409 


HepCla 'segment 2 


90 nts 


SEQ ID NO: 410 


Polypeptide encoded by SEQ ED NO: 409 


30 aa 


SEQ ID NO: 41 1 


HepCla segment 3 


90 nts 


SEQ ID NO: 412 


Polypeptide encoded by SEQ ID NO: 41 1 


30 aa 


SEQ ID NO: 413 


HepCla segment 4 


90 nts 


SEQ ID NO: 414 


Polypeptide encoded by SEQ ID NO: 413 


30 aa 


SEQ ID NO: 415 


HepCla segment 5 


90 nts 


SEQ ID NO: 416 


Polypeptide encoded by SEQ ED NO: 415 


30 aa 


SEQ ID NO: 417 


HepCla segment 6 


90 nts 


SEQ ID NO: 418 


Polypeptide encoded by SEQ ID NO: 417 


30 aa 


SEQ ID NO: 419 


HepCla segment 7 


90 nts 


SEQ ID NO: 420 


Polypeptide encoded by SEQ ID NO: 419 


30 aa 


SEQ ID NO: 421 


HepCla segment 8 


90 nts 


SEQ ID NO: 422 


Polypeptide encoded by SEQ ED NO: 421 


30 aa 


SEQ ID NO: 423 


HepCla segment 9 


90 nts 


SEQ ID NO: 424 


Polypeptide encoded by SEQ ID NO: 423 j 


30 aa 


SEQ ID NO: 425 


HepCla segment 10 


90 nts 


SEQ ID NO: 426 


Polypeptide encoded by SEQ ID NO: 425 


30 aa 


SEQ ID NO: 427 


HepCla segment 1 1 


90 nts 


SEQ ID NO: 428 


Polypeptide encoded by SEQ ID NO: 427 


30 aa 


SEQ ID NO: 429 


HepCla segment 12 


90 nts 


SEQ ID NO: 430 


Polypeptide encoded by SEQ ID NO: 429 


30 aa 
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SEQ ID NO: 


431 


SEQ ID NO: 


432 


SEQ ID NO: 


433 


SEQ ID NO: 


434 


SEQ ID NO: 


435 


SEQ ID NO: 


436 


SEQ ID NO: 


437 


SEQ ID NO: 


438 


SEQ ID NO: 


439 


SEQ ID NO: 


440 


SEQ ID NO: 


441 


SEQ ID NO: 


442 


SEQ ID NO: 


443 


SEQ ID NO: 


444 


SEQ ID NO: 


445 


SEQ ID NO: 


446 


SEQ ID NO: 


447 


SEQ ID NO: 


448 


SEQ ID NO: 


449 


SEQ ID NO: 


450 


SEQ ID NO: 


451 


SEQ ID NO: 


452 


SEQ ID NO: 


453 


SEQ ID NO: 


454 



HepC la segment 13 

Polypeptide encoded by SEQ ID NO: 431 
HepC la segment 14 

Polypeptide encoded by SEQ ID NO: 433 
HepCla segment 15 

Polypeptide encoded by SEQ ID NO: 435 
HepCla segment 16 

Polypeptide encoded by SEQ ED NO: 437 
HepCla segment 17 

Polypeptide encoded by SEQ ID NO: 439 
HepCla segment 18 

Polypeptide encoded by SEQ ID NO: 441 
HepCla segment 19 

Polypeptide encoded by SEQ ID NO: 443 
HepCla segment 20 

Polypeptide encoded by SEQ ID NO: 445 
HepCla segment 21 

Polypeptide encoded by SEQ ID NO: 447 
HepCla segment 22 

Polypeptide encoded by SEQ ID NO: 449 
HepCla segment 23 

Polypeptide encoded by SEQ ID NO: 451 
HepCla segment 24 

Polypeptide encoded by SEQ ID NO: 453 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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SEQ ID NO: 455 


HepCla segment 25 


90 nts 


SEQ ID NO: 456 


Polypeptide encoded by SEQ ID NO: 455 


30 aa 


SEQ ID NO: 457 


HepC 1 a segment 26 


90 nts 


SEQ ID NO: 458 


Polypeptide encoded by SEQ ID NO: 457 


30 aa 


SEQ ID NO: 459 


HepCla segment 27 


90 nts 


SEQ ID NO: 460 


Polypeptide encoded by SEQ ID NO: 459 


30 aa 


SEQ ID NO: 461 


HepC 1 a segment 28 


90 nts 


SEQ ID NO: 462 


Polypeptide encoded by SEQ ID NO: 461 


30 aa 


SEQ ID NO: 463 


HepCla segment 29 


90 nts 


SEQ ID NO: 464 


Polypeptide encoded by SEQ ID NO: 463 


30 aa 


SEQ ID NO: 465 


HepCla segment 30 


90 nts 


SEQ ID NO: 466 


Polypeptide encoded by SEQ ID NO: 465 


30 aa 


SEQ ED NO; 467 


HepC 1 a segment 31 


90 nts 


SEQ ID NO: 468 


Polypeptide encoded by SEQ ID NO: 467 


30 aa 


SEQ ED NO: 469 


HepCla segment 32 


90 nts 


SEQ ID NO: 470 


Polypeptide encoded by SEQ ID NO: 469 


30 aa 


SEQ ID NO: 471 


HepCla segment 33 


90 nts 


SEQ ID NO: 472 


Polypeptide encoded by SEQ ID NO: 471 


30 aa 


SEQ ID NO: 473 


HepCla segment 34 


90 nts 


SEQ ED NO: 474 


Polypeptide encoded by SEQ ID NO: 473 


30 aa 


SEQ ED NO: 475 


HepCla segment 35 


90 nts 


SEQ ED NO: 476 


Polypeptide encoded by SEQ ID NO: 475 


30 aa 


SEQ ED NO: 477 


HepCla segment 36 


90 nts 


SEQ ED NO: 478 


Polypeptide encoded by SEQ ID NO: 477 


30 aa 
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SEQ ID NO: 479 


HepC la segment 37 


90 nts 


SEQ ED NO: 480 


Polypeptide encoded by SEQ ID NO: 479 


30 aa 


SEQ ID NO: 481 


HepCla segment 38 


90 nts 


SEQ ID NO: 482 


Polypeptide encoded by SEQ ID NO: 481 


30 aa 


SEQ ID NO: 483 


HepCla segment 39 


90 nts 


SEQ ID NO: 484 


Polypeptide encoded by SEQ ID NO: 483 


30 aa 


SEQ ID NO: 485 


HepC 1 a segment 40 


90 nts 


SEQ ID NO: 486 


Polypeptide encoded by SEQ ID NO: 485 


30 aa 


SEQ ID NO: 487 


HepCla segment 41 


90 nts 


SEQ ID NO: 488 


Polypeptide encoded by SEQ ID NO: 487 


30 aa 


SEQ ID NO: 489 


HepCla segment 42 


90 nts 


SEQ ID NO: 490 


Polypeptide encoded by SEQ ID NO: 489 


30 aa 


SEQ ID NO: 491 


HepCla segment 43 


90 nts 


SEQ ID NO: 492 


Polypeptide encoded by SEQ ID NO: 491 


30 aa 


SEQ ID NO: 493 


HepCla segment 44 


90 nts 


SEQ ED NO: 494 


Polypeptide encoded by SEQ ID NO: 493 


30 aa 


SEQ ID NO: 495 


HepCla segment 45 


90 nts 


SEQ ED NO: 496 


Polypeptide encoded by SEQ ID NO: 495 


30 aa 


SEQ ED NO: 497 


HepCla segment 46 


90 nts 


SEQ ED NO: 498 


Polypeptide encoded by SEQ ID NO: 497 


30 aa 


SEQ ED NO: 499 


HepC 1 a segment 47 


90 nts 


SEQ ID NO: 500 


Polypeptide encoded by SEQ ID NO: 499 


30 aa 


SEQ ID NO: 501 j 


HepCla segment 48 


90 nts 


SEQ ID NO: 502 


Polypeptide encoded by SEQ ID NO: 501 


30 aa 
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SEQ ID NO: 503 


HepC la segment 49 


90 nts 


SEQ ED NO: 504 


Polypeptide encoded by SEQ ED NO: 503 


30 aa 


SEQ ED NO: 505 


HepCl a segment 50 


90 nts 


SEQ ED NO: 506 


Polypeptide encoded by SEQ ID NO: 505 


30 aa 


SEQ ED NO: 507 


HepC la segment 51 


90 nts 


SEQ ID NO: 508 


Polypeptide encoded by SEQ ID NO: 507 


30 aa 


SEQ ID NO: 509 


HepC la segment 52 


90 nts 


SEQ ED NO: 510 


Polypeptide encoded by SEQ ID NO: 509 


30 aa 


SEQ ED NO: 511 


HepC 1 a segment 53 


90 nts 


SEQ ED NO: 512 


Polypeptide encoded by SEQ ID NO: 511 


30 aa 


SEQ ED NO: 513 


HepC la segment 54 


90 nts 


SEQ ED NO: 514 


Polypeptide encoded by SEQ ID NO: 5 1 3 


30 aa 


SEQ ID NO: 515 


HepCla segment 55 


90 nts 


SEQ ID NO: 516 


Polypeptide encoded by SEQ ID NO: 515 


30 aa 


SEQ ID NO: 517 


HepC 1 a segment 56 


90 nts 


SEQ ID NO: 518 


Polypeptide encoded by SEQ ID NO: 517 


30 aa 1 


SEQ ED NO: 519 


HepCla segment 57 


90 nts 


SEQ ED NO: 520 


Polypeptide encoded by SEQ ID NO: 519 


30 aa 


SEQ ED NO: 521 


HepCla segment 58 


90 nts 


SEQ ED NO: 522 


Polypeptide encoded by SEQ ID NO: 521 


30 aa 


SEQ ID NO: 523 


HepC 1 a segment 59 


90 nts 


SEQ ED NO: 524 


Polypeptide encoded by SEQ ID NO: 523 


30 aa 


SEQ ED NO: 525 


HepCla segment 60 


90 nts 


SEQ ED NO: 526 


Polypeptide encoded by SEQ ID NO: 525 


30 aa 
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SEQ ID NO: 527 


HepCla segment 61 


90 nts 


SEQ ID NO: 528 


Polypeptide encoded by SEQ ID NO: 527 


30 aa 


SEQ ID NO: 529 


HepCla segment 62 


90 nts 


SEQ ID NO: 530 


Polypeptide encoded by SEQ ID NO: 529 


30 aa 


SEQ ID NO: 531 


HepCla segment 63 


90 nts 


SEQ ID NO: 532 


Polypeptide encoded by SEQ ID NO: 531 


30 aa 


SEQ ID NO: 533 


HepCla segment 64 


90 nts 


SEQ ID NO: 534 


Polypeptide encoded by SEQ ID NO: 533 


30 aa 


SEQ ID NO: 535 


HepCla segment 65 


90 nts 


SEQ ID NO: 536 


Polypeptide encoded by SEQ ID NO: 535 


30 aa 


SEQ ID NO: 537 


HepCl a segment 66 


90 nts 


SEQ ID NO: 538 


Polypeptide encoded by SEQ ID NO: 537 


30 aa 


SEQ ID NO: 539 


HepCla segment 67 


90 nts 


SEQ ID NO: 540 


Polypeptide encoded by SEQ ID NO: 539 


30 aa 


SEQ ID NO: 541 


HepCla segment 68 


90 nts 


SEQ ID NO: 542 


Polypeptide encoded by SEQ ID NO: 541 


30 aa 


SEQ ID NO: 543 


HepCla segment 69 


90 nts 


SEQ ID NO: 544 


Polypeptide encoded by SEQ ID NO: 543 


30 aa 


SEQ ID NO: 545 


HepCla segment 70 


90 nts 


SEQ ID NO: 546 


Polypeptide encoded by SEQ ID NO:545 


30 aa 


SEQ ID NO: 547 


HepCla segment 71 


90 nts 


SEQ ID NO: 548 


Polypeptide encoded by SEQ ID NO: 547 


30 aa 


SEQ ID NO: 549 


HepCla segment 72 


90 nts 


SEQ ID NO: 550 


Polypeptide encoded by SEQ ID NO: 549 


30 aa 
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SEQIDNO: 551 


HepCla segment 73 


90nts 


SEQ ID NO: 552 


Polypeptide encoded by SEQ ID NO: 551 


30 aa 


SEQ ID NO: 553 


HepCla segment 74 


90 nts 


SEQ ID NO: 554 


Polypeptide encoded by SEQ ID NO: 553 


30 aa 


SEQ ID NO: 555 


HepCla segment 75 


90 nts 


SEQ ID NO: 556 


Polypeptide encoded by SEQ ID NO: 555 


30 aa 


SEQ ID NO: 557 


HepCla segment 76 


90 nts 


SEQ ID NO: 558 


Polypeptide encoded by SEQ ID NO: 557 


30 aa 


SEQ ID NO: 559 


HepC 1 a segment 77 


90 nts 


SEQ ID NO: 560 


Polypeptide encoded by SEQ ID NO: 559 


30 aa 


SEQ ID NO: 561 


HepCla segment 78 


90 nts 


SEQ ID NO: 562 


Polypeptide encoded by SEQ ID NO: 561 


30 aa 


SEQIDNO: 563 


HepCla segment 79 


90 nts 


SEQ ID NO: 564 


Polypeptide encoded by SEQ ID NO: 563 


30 aa 


SEQ ID NO: 565 


HepCla segment 80 


90 nts 


SEQ ID NO: 566 


Polypeptide encoded by SEQ ID NO: 565 


30 aa 


SEQ ID NO: 567 


HepCla segment 81 


90 nts 


SEQ ID NO: 568 


Polypeptide encoded by SEQ ID NO: 567 


30 aa 


SEQIDNO: 569 


HepCla segment 82 


90 nts 


SEQ ID NO: 570 


Polypeptide encoded by SEQ ID NO: 569 


30 aa 


SEQIDNO: 571 


HepCla segment 83 


90 nts 


SEQ ID NO: 572 


Polypeptide encoded by SEQ ID NO: 571 


30 aa 


SEQ ID NO: 573 


HepCla segment 84 


90 nts 


SEQ ID NO: 574 


Polypeptide encoded by SEQ ID NO: 573 


30 aa 
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SEQ ID NO: 575 


HepCla segment 85 


90 nts 


SEQ ID NO: 576 


Polypeptide encoded by SEQ ID NO: 575 


30 aa 


SEQ ID NO: 577 


HepCla segment 86 


90 nts 


SEQ ID NO: 578 


Polypeptide encoded by SEQ ID NO: 577 


30 aa 


SEQ ID NO: 579 


HepCla segment 87 


90 nts 


SEQ ID NO: 580 


Polypeptide encoded by SEQ ID NO: 579 


30 aa 


SEQ ID NO: 581 


HepCla segment 88 


90 nts 


SEQ ID NO: 582 


Polypeptide encoded by SEQ ID NO: 581 


30 aa 


SEQ ID NO: 583 


HepCla segment 89 


90 nts 


SEQ ID NO: 584 


Polypeptide encoded by SEQ ID NO: 583 


30 aa 


SEQ ID NO: 585 


HepCla segment 90 


90 nts 


SEQ ID NO: 586 


Polypeptide encoded by SEQ ID NO: 585 


30 aa 


SEQ ID NO: 587 


HepCla segment 91 


90 nts 


SEQ ID NO: 588 


Polypeptide encoded by SEQ ID NO: 587 


30 aa 


SEQ ID NO: 589 


HepCl a segment 92 


90 nts 


SEQ ID NO: 590 


Polypeptide encoded by SEQ ID NO: 589 


30 aa ! 


SEQ ID NO: 591 


HepCl a segment 93 


90 nts | 


SEQ ID NO: 592 


Polypeptide encoded by SEQ ID NO: 591 


30 aa 


SEQ ID NO: 593 


HepCla segment 94 


90 nts 


SEQ ID NO: 594 


Polypeptide encoded by SEQ ID NO: 593 


30 aa 


SEQ ID NO: 595 


HepCla segment 95 


90 nts 


SEQ ID NO: 596 


Polypeptide encoded by SEQ ID NO: 595 


30 aa 


SEQ ID NO: 597 


HepC 1 a segment 96 


90 nts 


SEQ ID NO: 598 


Polypeptide encoded by SEQ ID NO: 597 


30 aa 



WO 01/090197 



PCT/AU01/00622 



-42- 









SEQ ID NO: 599 


HepC la segment 97 


90 nts 


SEQ ID NO: 600 


Polypeptide encoded by SEQ ID NO: 599 


30 aa 


SEQ ID NO: 601 


HepCl a segment 98 


90 nts 


SEQ ID NO: 602 


Polypeptide encoded by SEQ ID NO: 601 


30 aa 


SEQ ID NO: 603 


HepC la segment 99 


90 nts 


SEQ ID NO: 604 


Polypeptide encoded by SEQ ID NO: 603 


30 aa 


SEQ ID NO: 605 


HepCl a segment 1 00 


90 nts 


SEQ ID NO: 606 


Polypeptide encoded by SEQ ED NO: 605 


30 aa 


SEQ ID NO: 607 


HepCl a segment 101 


90 nts ! 


SEQ ID NO: 608 


Polypeptide encoded by SEQ ID NO: 607 


30 aa 


SEQ ID NO: 609 


HepC la segment 102 


90 nts 


SEQ ED NO: 610 


Polypeptide encoded by SEQ ID NO: 609 


30 aa 


SEQ ID NO: 611 


HepCla segment 103 


90 nts 


SEQ ID NO: 612 


Polypeptide encoded by SEQ ID NO: 61 1 


30 aa 


SEQ ID NO: 613 


HepCla segment 304 


90 nts 


SEQ ID NO: 614 


Polypeptide encoded by SEQ ID NO: 613 


30 aa 


SEQ ID NO: 615 


HepCla segment 105 


90 nts 


SEQ ID NO: 616 


Polypeptide encoded by SEQ ID NO: 615 


30 aa 


SEQ ID NO: 617 


HepCla segment 106 


90 nts 


SEQ ID NO: 618 


Polypeptide encoded by SEQ ID NO: 617 


30 aa 


SEQ ID NO: 619 


HepCla segment 107 


90 nts 


SEQ ID NO: 620 


Polypeptide encoded by SEQ ID NO: 619 


30 aa 


SEQ ID NO: 621 


HepCla segment 108 


90 nts 


SEQ ID NO: 622 


Polypeptide encoded by SEQ ID NO: 621 


30 aa 
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EBi 


SEQ ED NO: 623 


HepC la segment 109 


90 nts 


SEQ ID NO: 624 


Polypeptide encoded by SEQ ID NO: 623 


30 aa 


SEQ ED NO: 625 


HepC la segment 110 


90 nts 


SEQ ID NO: 626 


Polypeptide encoded by SEQ ID NO: 625 


30 aa 


SEQ ID NO: 627 


HepC la segment 1 1 1 


90 nts 


SEQ ID NO. 628 


Polypeptide encoded by SEQ ID NO: 627 


30 aa 


SEQ ID NO: 629 


HepC 1 a segment 1 1 2 


90 nts 


SEQ ID NO: 630 


Polypeptide encoded by SEQ ID NO: 629 


30 aa 


SEQ ID NO: 631 


HepC la segment 113 


90 nts 


SEQ ID NO: 632 


Polypeptide encoded by SEQ ID NO: 631 


30 aa 


SEQ ID NO: 633 


HepC la segment 114 


90 nts 


SEQ ID NO: 634 


Polypeptide encoded by SEQ ED NO: 633 


30 aa 


SEQ ID NO: 635 


HepC la segment 115 


90 nts 


SEQ ID NO: 636 


Polypeptide encoded by SEQ ID NO: 635 


30 aa 


SEQ ID NO: 637 


HepCla segment 116 


90 nts 


SEQ ID NO: 638 


Polypeptide encoded by SEQ ID NO: 637 


30 aa 


SEQ ED NO: 639 


HepCla segment 1 17 


90 nts 


SEQ ED NO: 640 


Polypeptide encoded by SEQ ID NO: 639 


30 aa 


SEQ ID NO: 641 


HepCla segment 118 


90 nts 


SEQ ED NO: 642 


Polypeptide encoded by SEQ ID NO: 641 


30 aa 


SEQ ID NO: 643 


HepCla segment 119 


90 nts 


SEQ ID NO: 644 


Polypeptide encoded by SEQ ID NO: 643 


30 aa 


SEQ ED NO: 645 


HepCla segment 120 


90 nts 


SEQ ED NO: 646 


Polypeptide encoded by SEQ ID NO: 645 


30 aa 
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SEQ ID NO: 647 


HepC 1 a segment 1 2 1 


90 nts 


SEQ ID NO: 648 


Polypeptide encoded by SEQ ID NO: 647 


30 aa 


SEQ ID NO: 649 


HepC la segment 122 


90 nts 


SEQ ID NO: 650 


Polypeptide encoded by SEQ ID NO: 649 


30 aa 


SEQ ID NO: 651 


HepC la segment 123 


90 nts 


SEQ ID NO: 652 


Polypeptide encoded by SEQ ID NO: 651 


30 aa 


SEQ ID NO: 653 


HepCla segment 124 


90 nts 


SEQ ID NO: 654 


Polypeptide encoded by SEQ ID NO: 653 


30 aa 


SEQ ID NO: 655 


HepCla segment 125 


90 nts 


SEQ ID NO: 656 


Polypeptide encoded by SEQ ID NO: 655 


30 aa 


SEQ ID NO: 657 


HepCla segment 126 


90 nts 


SEQ ID NO: 658 


Polypeptide encoded by SEQ ID NO: 657 


30 aa 


SEQ ID NO: 659 


HepCla segment 127 


90 nts 


SEQ ID NO: 660 


Polypeptide encoded by SEQ ID NO: 659 


30 aa 


SEQ ID NO: 661 


HepCla segment 128 


90 nts 


SEQ ID NO: 662 


Polypeptide encoded by SEQ ID NO: 661 


30 aa 


SEQ ID NO: 663 


HepCla segment 129 


90 nts 


SEQ ID NO: 664 


Polypeptide encoded by SEQ ID NO: 663 


30 aa 


SEQ ID NO: 665 


HepC 1 a segment 1 30 


90 nts 


SEQ ID NO: 666 


Polypeptide encoded by SEQ ID NO: 665 


30 aa 


SEQ ID NO: 667 


HepCla segment 131 


90 nts 


SEQ ID NO: 668 


Polypeptide encoded by SEQ ID NO: 667 


30 aa 


SEQ ID NO: 669 


HepC 1 a segment 1 32 


90 nts 


SEQ ID NO: 670 


Polypeptide encoded by SEQ ID NO: 669 


30 aa 
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SEQIDNO: 671 


HepC la segment 133 


90 nts 


SEQ ID NO: 672 


Polypeptide encoded by SEQ ID NO: 671 


30 aa 


SEQ ID NO: 673 


HepC la segment 134 


90 nts 


SEQ ID NO: 674 


Polypeptide encoded by SEQ ID NO: 673 


30 aa 


SEQ ID NO: 675 


HepCl a segment 135 


90 nts 


SEQ ID NO: 676 


Polypeptide encoded by SEQ ID NO: 675 


30 aa 


SEQ ID NO: 677 


HepC la segment 136 


90 nts 


SEQ ID NO: 678 


Polypeptide encoded by SEQ ED NO: 677 


30 aa 


SEQ ID NO: 679 


HepC la segment 137 


90 nts 


SEQ ID NO: 680 


Polypeptide encoded by SEQ ID NO: 679 


30 aa 


SEQIDNO: 681 


HepC la segment 138 


90 nts 


SEQ ID NO: 682 


Polypeptide encoded by SEQ ID NO: 68 1 


30 aa 


SEQ ID NO: 683 


HepCla segment 139 


90 nts 


SEQ ID NO: 684 


Polypeptide encoded by SEQ ID NO: 683 


30 aa 


SEQ ID NO: 685 


HepCla segment 140 


90 nts 


SEQ ID NO: 686 


Polypeptide encoded by SEQ ID NO: 685 


30 aa 


SEQ ID NO: 687 


HepCla segment 141 


90 nts 


SEQ ID NO: 688 


Polypeptide encoded by SEQ ID NO: 687 


30 aa 


SEQ ID NO: 689 


HepCla segment 142 


90 nts 


SEQ ID NO: 690 


Polypeptide encoded by SEQ ID NO: 689 ! 


30 aa 


SEQIDNO: 691 


HepCla segment 143 


90 nts 


SEQ ID NO: 692 


Polypeptide encoded by SEQ ID NO: 691 


30 aa 


SEQ ID NO. 693 


HepCla segment 144 


90 nts 


SEQ ID NO: 694 


Polypeptide encoded by SEQ ID NO: 693 


30 aa 
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SEQ ID NO: 695 


HepC la segment 145 


90 nts 


SEQ ID NO: 696 


Polypeptide encoded by SEQ ID NO: 695 


30 aa 


SEQ ID NO: 697 


HepC 1 a segment 1 46 


90 nts 


SEQ ID NO: 698 


Polypeptide encoded by SEQ ID NO: 697 


30 aa 


SEQ ID NO: 699 


HepC la segment 147 


90 nts 


SEQ ID NO: 700 


Polypeptide encoded by SEQ ID NO: 699 


30 aa 


SEQ ID NO: 701 


HepC la segment 148 


90 nts 


SEQ ID NO: 702 


Polypeptide encoded by SEQ ID NO: 701 


30 aa 


SEQ ID NO: 703 


HepC la segment 149 


90 nts 


SEQ ID NO: 704 


Polypeptide encoded by SEQ ID NO: 703 


30 aa 


SEQ ID NO: 705 


HepCla segment 150 


90 nts 


SEQ ID NO: 706 


Polypeptide encoded by SEQ ID NO: 705 


30 aa 


SEQ ID NO: 707 


HepCl a segment 1 5 1 


90 nts 


SEQ ID NO: 708 


Polypeptide encoded by SEQ ID NO: 707 


30 aa 


SEQ ID NO: 709 


HepCla segment 152 


90 nts 


SEQ ID NO: 710 


Polypeptide encoded by SEQ ID NO: 709 


30 aa 


SEQ ID NO: 711 


HepCla segment 153 


90 nts 


SEQ ED NO: 712 


Polypeptide encoded by SEQ ID NO: 711 


30 aa 


SEQ ID NO: 713 


HepCla segment 154 


90 nts 


SEQ ID NO: 714 


Polypeptide encoded by SEQ ID NO: 713 


30 aa 


SEQ ID NO: 71 5 


HepCla segment 155 


90 nts | 


SEQ ED NO: 716 


Polypeptide encoded by SEQ ED NO: 715 


30 aa 


SEQ ID NO: 717 


HepCla segment 156 


90 nts 


SEQ ID NO: 718 


Polypeptide encoded by SEQ ID NO: 717 


30 aa | 
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SEQ DD NO: 719 


HepC la segment 157 


90 nts 


SEQ ED NO: 720 


Polypeptide encoded by SEQ ID NO: 719 


30 aa 


SEQ ID NO: 721 


HepCla segment 158 


90 nts 


SEQ ED NO: 722 


Polypeptide encoded by SEQ ID NO: 721 


30 aa 


SEQ ID NO: 723 


HepCla segment 159 


90 nts 


SEQ ID NO: 724 


Polypeptide encoded by SEQ ID NO: 723 


30 aa 


SEQ ID NO: 725 


HepCla segment 160 


90 nts 


SEQ DD NO: 726 


Polypeptide encoded by SEQ ID NO: 725 


30 aa 


SEQ ID NO: 727 


HepCla segment 161 


90 nts 


SEQ DD NO: 728 


Polypeptide encoded by SEQ ID NO: 727 


30 aa 


SEQ DD NO: 729 


HepCla segment 162 


90 nts 


SEQ DD NO: 730 


Polypeptide encoded by SEQ ID NO: 729 


30 aa 


SEQ DD NO: 731 


HepCla segment 163 


90 nts 


SEQ DD NO: 732 


Polypeptide encoded by SEQ ID NO: 731 


30 aa 


SEQ DD NO: 733 


HepCla segment 164 


90 nts 


SEQ DD NO: 734 


Polypeptide encoded by SEQ ID NO: 733 


30 aa 


SEQ DD NO: 735 


HepCla segment 165 


90 nts 


SEQ DD NO: 736 


Polypeptide encoded by SEQ ID NO: 735 


30 aa 


SEQ DD NO: 737 


HepCla segment 166 


90 nts 


SEQ DD NO: 738 


Polypeptide encoded by SEQ ID NO: 737 


30 aa 


SEQ DD NO: 739 


HepCla segment 167 


90 nts 


SEQ DD NO: 740 


Polypeptide encoded by SEQ ID NO: 739 


30 aa 


SEQ DD NO: 741 


HepCla segment 168 


90 nts 


SEQ DD NO: 742 


Polypeptide encoded by SEQ ID NO: 741 


30 aa 
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SEQ ID NO: 743 


HepCl a segment 169 


90 nts 


SEQ ID NO: 744 


Polypeptide encoded by SEQ ID NO: 743 


30 aa 


SEQ ID NO: 745 


HepCl a segment 170 


90 nts 


SEQ ID NO: 746 


Polypeptide encoded by SEQ ID NO: 745 


30 aa 


SEQ ID NO: 747 


HepC 1 a segment 171 


90 nts 


SEQ ID NO: 748 


Polypeptide encoded by SEQ ID NO: 747 


30 aa 


SEQ ID NO: 749 


HepC la segment 172 


90 nts 


SEQ ID NO: 750 


Polypeptide encoded by SEQ ID NO: 749 


30 aa 


SEQ ID NO: 751 


HepC la segment 173 


90 nts 


SEQ ID NO: 752 


Polypeptide encoded by SEQ ID NO: 751 


30 aa 


SEQ ID NO: 753 


HepC 1 a segment 1 74 


90 nts 


SEQ ID NO: 754 


Polypeptide encoded by SEQ ID NO: 753 


30 aa 


SEQ ID NO: 755 


HepC 1 a segment 1 75 


90 nts 


SEQ ID NO: 756 


Polypeptide encoded by SEQ ID NO: 755 


30 aa 


SEQ ID NO: 757 


HepC la segment 176 


90 nts 


SEQ ID NO: 758 


Polypeptide encoded by SEQ ID NO: 757 


30 aa 


SEQ ED NO: 759 


HepC la segment 177 


90 nts 


SEQ ID NO: 760 


Polypeptide encoded by SEQ ID NO: 759 


30 aa 


SEQ ID NO: 761 


HepC la segment 178 


90 nts 


SEQ ID NO: 762 


Polypeptide encoded by SEQ ID NO: 761 


30 aa 


SEQ ID NO: 763 


HepCl a segment 179 


90 nts 


SEQ ID NO: 764 


Polypeptide encoded by SEQ ID NO: 763 


30 aa 


SEQ ID NO: 765 


HepCl a segment 180 


90 nts 


SEQ ID NO: 766 


Polypeptide encoded by SEQ ID NO: 765 


30 aa 
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SEQ ID NO: 767 


HepC 1 a segment 1 8 1 


90 nts 1 


SEQ ID NO: 768 


Polypeptide encoded by SEQ ID NO: 767 


30 aa 


SEQ ID NO: 769 


HepC la segment 182 


90 nts 


SEQ ID NO: 770 


Polypeptide encoded by SEQ ID NO: 769 


30 aa 


SEQ ID NO: 771 


HepC la segment 183 


90 nts 


SEQ ID NO: 772 


Polypeptide encoded by SEQ ID NO: 771 


30 aa 


SEQ ID NO: 773 


HepC la segment 184 


90 nts 


SEQ ID NO: 774 


Polypeptide encoded by SEQ ID NO: 773 


30 aa 


SEQ ID NO: 775 


HepC la segment 185 


90 nts 


SEQ ID NO: 776 


Polypeptide encoded by SEQ ID NO: 775 


30 aa 


SEQ ID NO: 777 


HepC la segment 186 


90 nts 


SEQ ID NO: 778 


Polypeptide encoded by SEQ ID NO: 777 


30 aa 


SEQ ID NO: 779 


HepC la segment 187 


90 nts 


SEQ ID NO: 780 


Polypeptide encoded by SEQ ID NO: 779 


30 aa 


SEQ ID NO: 781 


HepCla segment 188 


90 nts 


SEQ ID NO: 782 


Polypeptide encoded by SEQ ID NO: 781 


30 aa 


SEQ ID NO: 783 


HepCla segment 189 


90 nts 


SEQ ID NO: 784 


Polypeptide encoded by SEQ ID NO: 783 


30 aa 


SEQ ID NO: 785 


HepCla segment 190 


90 nts 


SEQ ID NO: 786 


Polypeptide encoded by SEQ ID NO: 785 


30 aa 


SEQ ID NO: 787 


HepCla segment 191 


90 nts 


SEQ ID NO: 788 


Polypeptide encoded by SEQ ID NO: 787 


30 aa 


SEQ ID NO: 789 


HepCla segment 192 


90 nts 


SEQ ID NO: 790 


Polypeptide encoded by SEQ ID NO: 789 


30 aa 
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SEQIDNO: 791 


HepC 1 a segment 1 93 


90 nts 


SEQ ID NO: 792 


Polypeptide encoded by SEQ ID NO: 791 


30 aa 


SEQ ID NO: 793 


HepC 1 a segment 1 94 


90 nts 


SEQ ID NO: 794 


Polypeptide encoded by SEQ ID NO: 793 


30 aa 


SEQ ID NO: 795 


HepC la segment 195 


90 nts 


SEQ ID NO: 796 


Polypeptide encoded by SEQ ID NO: 795 


30 aa 


SEQ ID NO: 797 


HepC la segment 196 


90 nts 


SEQ ID NO: 798 


Polypeptide encoded by SEQ ID NO: 797 


30 aa 


SEQ ID NO: 799 


HepCl a segment 197 


90 nts 


SEQ ID NO: 800 


Polypeptide encoded by SEQ ID NO: 799 


30 aa 


SEQ ID NO: 801 1 


HepC la segment 198 


90 nts 


SEQ ID NO: 802 


Polypeptide encoded by SEQ ID NO: 801 


30 aa 


SEQ ID NO: 803 


HepC la segment 199 


90 nts 


SEQ ID NO: 804 


Polypeptide encoded by SEQ ID NO: 803 


30 aa 


SEQ ID NO: 805 


HepC la segment 200 


90 nts 


SEQ ID NO: 806 


Polypeptide encoded by SEQ ID NO: 805 


30 aa 


SEQ ID NO: 807 


HepC la segment 201 


45 nts 


SEQ ID NO: 808 


Polypeptide encoded by SEQ ID NO: 807 


15 aa 


SEQ ID' NO: 809 


HepCla scrambled 


17955 nts 


SEQIDNO: 810 


Polypeptide encoded by SEQ ED NO: 809 


5985 aa 


SEQ ID NO: 81 1 


HepC Cassette A 


6065 nts 


SEQIDNO: 812 


Polypeptide encoded by SEQ ID NO: 811 


2011 aa 


SEQ ID NO: 81 3 


HepC Cassette B 


6069 nts 


SEQIDNO: 814 


Polypeptide encoded by SEQ ID NO: 813 


2010 aa 
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SEQIDNO: 815 


HepC Cassette C 


6030 nts 


SEQIDNO: 816 


Polypeptide encoded by SEQ ID NO: 815 


1997 aa 


SEQIDNO: 817 


gplOO consensus polypeptide 


661 aa 


SEQIDNO: 818 


MART consensus polypeptide 


118aa 


SEQIDNO: 819 


TRP-1 consensus polypeptide 


248 aa 


SEQ ID NO: 820 


Tyros consensus polypeptide 


529 aa 


SEQIDNO: 821 


TRP2 consensus polypeptide 


519 aa 


SEQ ID NO: 822 


MC1R consensus polypeptide 


317 aa 


SEQ ID NO: 823 


MUC1F consensus polypeptide 


125 aa 


SEQ ID NO: 824 


MUC1R consensus polypeptide 


312 aa 


SEQ ID NO: 825 


BAGE consensus polypeptide 


43 aa 


SEQ ID NO: 826 


GAGE-1 consensus polypeptide 


138 aa 


SEQ ID NO: 827 


gpl001n4 consensus polypeptide 


51 aa 


SEQ ID NO: 828 


MAGE-1 consensus polypeptide 


309 aa 


SEQ ID NO: 829 


MAGE-3 consensus polypeptide 


314 aa 


SEQ ID NO: 830 ! 


PRAME consensus polypeptide 


509 aa 


SEQIDNO: 831 


TRP21N2 consensus polypeptide 


54 aa 


SEQ ID NO: 832 


NYNSOla consensus polypeptide 


180 aa 


SEQ ID NO: 833 


NYNSOlb consensus polypeptide 


58 aa 


SEQ ID NO: 834 


LAGE1 consensus polypeptide 


180 aa 


SEQ ID NO: 835 


gp 100 segment 1 


90 nts 


SEQ ID NO: 836 


Polypeptide encoded by SEQ ID NO: 835 


30 aa 


SEQ ID NO: 837 


gplOO segment 2 


90 nts 


SEQ ID NO: 838 


Polypeptide encoded by SEQ ID NO: 837 


30 aa 
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SEQ ID NO: 839 


gplOO segment 3 


90 nts 


SEQ ID NO: 840 


Polypeptide encoded by SEQ ID NO: 839 


30 aa 


SEQ ID NO: 841 


gplOO segment 4 


90 nts 


SEQ ID NO: 842 


Polypeptide encoded by SEQ ID NO: 841 


30 aa 


SEQ ID NO: 843 


gplOO segment 5 


90 nts ] 


SEQ ID NO: 844 


Polypeptide encoded by SEQ ID NO: 843 


30 aa 


SEQ ID NO: 845 


gplOO segment 6 


90 nts 


SEQ ID NO: 846 


Polypeptide encoded by SEQ ID NO: 845 


30 aa 


SEQ ID NO: 847 


gp 100 segment 7 


90 nts 


SEQ ID NO: 848 


Polypeptide encoded by SEQ ED NO: 847 


30 aa 


SEQ ID NO: 849 


gp 100 segment 8 


90 nts 


SEQ ID NO: 850 


Polypeptide encoded by SEQ ID NO: 849 


30 aa 


SEQ ID NO: 851 


gplOO segment 9 


90 nts 


SEQ ID NO: 852 


Polypeptide encoded by SEQ ID NO: 85 1 


30 aa 


SEQ ID NO: 853 


gplOO segment 10 


90 nts 


SEQ ID NO: 854 


Polypeptide encoded by SEQ ID NO: 853 


30 aa 


SEQ ID NO: 855 


gplOO segment 1 1 


90 nts 


SEQ ID NO: 856 


Polypeptide encoded by SEQ ID NO: 855 


30 aa 


SEQ ID NO: 857 


gplOO segment 12 


90 nts 


SEQ ID NO: 858 


Polypeptide encoded by SEQ ID NO: 857 


30 aa 


SEQ ID NO: 859 


gplOO segment 13 


90 nts 


SEQ ID NO: 860 


Polypeptide encoded by SEQ ID NO: 859 


30 aa 


SEQ ID NO: 861 


gplOO segment 14 


90 nts 


SEQ ID NO: 862 


Polypeptide encoded by SEQ ID NO: 861 


30 aa 
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SEQ ID NO: 863 


gplOO segment 15 


90 nts 


SEQ ID NO: 864 


Polypeptide encoded by SEQ ID NO: 863 


30 aa 


SEQ ID NO: 865 


gplOO segment 16 


90 nts 


SEQ ID NO: 866 


Polypeptide encoded by SEQ ID NO: 865 


30 aa 


SEQ ID NO: 867 


gplOO segment 17 


90 nts 


SEQ ID NO: 868 


Polypeptide encoded by SEQ ID NO: 867 


30 aa 


SEQ ID NO: 869 


gplOO segment 18 


90 nts 


SEQ ID NO: 870 


Polypeptide encoded by SEQ ID NO: 869 


30 aa 


SEQ ID NO: 871 


gplOO segment 19 


90 nts 


SEQ ID NO: 872 


Polypeptide encoded by SEQ ID NO: 871 


30 aa 


SEQ ID NO: 873 


gplOO segment 20 


90 nts 


SEQ ID NO: 874 


Polypeptide encoded by SEQ ID NO: 873 


30 aa 


SEQ ID NO: 875 


gplOO segment 21 


90 nts 


SEQ ID NO: 876 


Polypeptide encoded by SEQ ID NO: 875 


30 aa 


SEQ ID NO: 877 


gplOO segment 22 


90 nts 


SEQ ID NO: 878 


Polypeptide encoded by SEQ ID NO: 877 


30 aa J 


SEQ ID NO: 879 


gplOO segment 23 


90 nts 


SEQ ID NO: 880 


Polypeptide encoded by SEQ ID NO: 879 


30 aa l : 


SEQ ID NO: 881 


gplOO segment 24 


90 nts 


SEQ ID NO: 882 


Polypeptide encoded by SEQ ID NO: 881 


30 aa 


SEQ ID NO: 883 


gplOO segment 25 


90 nts 


SEQ ID NO: 884 


Polypeptide encoded by SEQ ID NO: 883 


30 aa 


SEQ ID NO: 885 


gplOO segment 26 


90 nts 


SEQ ID NO: 886 


Polypeptide encoded by SEQ ID NO: 885 


30 aa ! 
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SEQ ID NO: 887 


gplOO segment 27 


90 nts 


SEQ ID NO: 888 


Polypeptide encoded by SEQ ID NO: 887 


30 aa 


SEQ ID NO: 889 


gplOO segment 28 


90 nts 


SEQ ID NO: 890 


Polypeptide encoded by SEQ ID NO: 889 


30 aa 


SEQ ID NO: 891 


gplOO segment 29 


90 nts 


SEQ ID NO: 892 


Polypeptide encoded by SEQ ID NO: 891 


30 aa 


SEQ ID NO: 893 


gplOO segment 30 


90 nts 


SEQ ID NO: 894 


Polypeptide encoded by SEQ ID NO: 893 


30 aa 


SEQ ID NO: 895 


gplOO segment 31 


90 nts 


SEQ ID NO: 896 


Polypeptide encoded by SEQ ID NO: 895 


30 aa 


SEQ ID NO: 897 1 


gplOO segment 32 


90 nts 


SEQ ID NO: 898 


Polypeptide encoded by SEQ ID NO: 897 


30 aa 


SEQ ID NO: 899 


gp 100 segment 33 


90 nts 


SEQ ID NO: 900 


Polypeptide encoded by SEQ ID NO: 899 


30 aa 


SEQ ID NO: 901 


gplOO segment 34 


90 nts 


SEQ ED NO: 902 


Polypeptide encoded by SEQ ED NO: 901 


30 aa 


SEQ ID NO: 903 


gplOO segment 35 


90 nts 


SEQ ID NO: 904 


Polypeptide encoded by SEQ ED NO: 903 


30 aa 


SEQ ID NO: 905 


gplOO segment 36 


90 nts 


SEQ ID NO: 906 


Polypeptide encoded by SEQ ID NO: 905 


30 aa 


SEQ ID NO: 907 


gplOO segment 37 


90 nts 


SEQ ID NO: 908 


Polypeptide encoded by SEQ ID NO: 907 | 


30 aa 


SEQ ID NO: 909 


gplOO segment 38 


90 nts 


SEQ ID NO: 910 


Polypeptide encoded by SEQ ID NO: 909 


30 aa 
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SEQ ID NO: 91 1 


gplOO segment 39 


90 nts 


SEQIDNO: 912 


Polypeptide encoded by SEQ ID NO: 91 1 


30 aa 


SEQ ID NO: 913 


gplOO segment 40 


90 nts 


SEQIDNO: 914 


Polypeptide encoded by SEQ ID NO: 913 


30 aa 


SEQ ID NO: 915 


gplOO segment 41 


90 nts 


SEQ ID NO: 916 


Polypeptide encoded by SEQ ID NO: 915 


30 aa 


SEQIDNO: 917 


gplOO segment 42 


90 nts 


SEQIDNO: 918 


Polypeptide encoded by SEQ ID NO: 917 


30 aa 


SEQIDNO: 919 


gplOO segment 43 


90 nts 


SEQ ID NO: 920 


Polypeptide encoded by SEQ ID NO: 919 


30 aa 


SEQIDNO: 921 


gplOO segment 44 


60nts 


SEQ ID NO: 922 


Polypeptide encoded by SEQ JD NO: 921 


20 aa 


SEQ ID NO: 923 


MART segment 1 


90 nts 


SEQ ID NO: 924 


Polypeptide encoded by SEQ ID NO: 923 


30 aa 


SEQ ID NO: 925 


MART segment 2 


90 nts 


SEQ ID NO: 926 


Polypeptide encoded by SEQ ID NO: 925 


30 aa 


SEQ ID NO: 927 


MART segment 3 


90 nts 


SEQ ID NO: 928 


Polypeptide encoded by SEQ ED NO: 927 


30 aa 


SEQ ID NO: 929 


MART segment 4 


90 nts 


SEQ ID NO: 930 


Polypeptide encoded by SEQ ID NO: 929 


30 aa 


SEQIDNO: 931 


MART segment 5 


90 nts 


SEQ ID NO: 932 


Polypeptide encoded by SEQ ID NO: 931 


30 aa 


SEQ ID NO: 933 


MART segment 6 


90 nts 


SEQ ID NO: 934 


Polypeptide encoded by SEQ ID NO: 933 


30 aa 
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SEQ ID NO: 935 


MART segment 7 


90 nts 


SEQ ID NO: 936 


Polypeptide encoded by SEQ ID NO: 935 


30 aa 


SEQ ID NO: 937 


MART segment 8 


51 nts 


SEQ ID NO: 938 


Polypeptide encoded by SEQ ID NO: 937 


17 aa 


SEQ ID NO: 939 


trp-1 segment 1 


90 nts 


SEQ ID NO: 940 


Polypeptide encoded by SEQ ID NO: 939 


30 aa 


SEQ ID NO: 941 


trp-1 segment 2 


90 nts 


SEQ ID NO: 942 


Polypeptide encoded by SEQ ID NO: 941 


30 aa 


SEQ ID NO: 943 


trp-1 segment 3 


90 nts 


SEQ ID NO: 944 


Polypeptide encoded by SEQ ID NO: 943 


30 aa 


SEQ ID NO: 945 


trp-1 segment 4 


90 nts 


SEQ ID NO: 946 


Polypeptide encoded by SEQ ID NO: 945 


30 aa 


SEQ ID NO: 947 


trp-1 segment 5 


90 nts 


SEQ ID NO: 948 


Polypeptide encoded by SEQ ID NO: 947 


30 aa 


SEQ ID NO: 949 


trp- 1 segment 6 


90 nts 


SEQ ID NO: 950 


Polypeptide encoded by SEQ ID NO: 949 


30 aa 


SEQ ID NO: 951 


trp-1 segment 7 


90 nts 


SEQ ID NO: 952 


Polypeptide encoded by SEQ ID NO: 951 


30 aa 


SEQ ID NO: 953 


trp-1 segment 8 


90 nts 


SEQ ID NO: 954 


Polypeptide encoded by SEQ ID NO: 953 


30 aa 


SEQ ID NO: 955 


trp-1 segment 9 


90 nts 


SEQ ID NO: 956 


Polypeptide encoded by SEQ ID NO: 955 


30 aa 


SEQ ID NO: 957 


trp-1 segment 10 


90 nts 


SEQ ID NO: 958 


Polypeptide encoded by SEQ ID NO: 957 


30 aa 
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SEQ ID NO: 959 


trp-l segment 1 1 


90 nts 


SEQIDNO: 960 


Polypeptide encoded by SEQ ID NO: 959 


30 aa 


SEQ ID NO: 961 


trp-l segment 12 


90 nts 


SEQ ID NO: 962 


Polypeptide encoded by SEQ ID NO: 961 


30 aa 


SEQ ID NO: 963 


trp-l segment 13 


90 nts 


SEQ ID NO: 964 


Polypeptide encoded by SEQ ID NO: 963 


30 aa 


SEQIDNO: 965 


trp-l segment 14 


90 nts 


SEQ ID NO: 966 


Polypeptide encoded by SEQ ID NO: 965 


30 aa 


SEQ ID NO: 967 


trp-l segment 15 


90 nts 


SEQ ID NO: 968 


Polypeptide encoded by SEQ ID NO: 967 


30 aa 


SEQ ID NO: 969 


trp-l segment 16 


81 nts 


SEQ ID NO: 970 


Polypeptide encoded by SEQ ID NO: 969 ' 


27 aa 


SEQIDNO: 971 


tyros segment 1 


90 nts 


SEQ ID NO: 972 


Polypeptide encoded by SEQ ID NO: 971 i 


30 aa 


SEQ ID NO: 973 


tyros segment 2 


90 nts 


SEQ ID NO: 974 


Polypeptide encoded by SEQ ID NO: 973 


30 aa 


SEQ ID NO: 975 


tyros segment 3 


90 nts 


SEQ ID NO: 976 


Polypeptide encoded by SEQ ID NO: 975 


30 aa 


SEQ ID NO: 977 


tyros segment 4 


90 nts 


SEQ ID NO: 978 


Polypeptide encoded by SEQ ID NO: 977 


30 aa 


SEQ ID NO: 979 


tyros segment 5 


90 nts 


SEQ ID NO: 980 


Polypeptide encoded by SEQ ID NO: 979 


30 aa 


SEQIDNO: 981 


tyros segment 6 


90 nts 


SEQ ID NO: 982 


Polypeptide encoded by SEQ ID NO: 981 


30 aa 
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I 


SEQ ID NO: 983 


tyros segment 7 


90 nts 


SEQ ED NO: 984 


Polypeptide encoded by SEQ ED NO: 983 


30 aa 


SEQ ID NO: 985 


tyros segment 8 


90 nts 


SEQ ID NO: 986 


Polypeptide encoded by SEQ ED NO: 985 


30 aa 


SEQ ID NO: 987 


tyros segment 9 


90 nts 


SEQ ID NO: 988 


Polypeptide encoded by SEQ ED NO: 987 


30 aa 


SEQ ID NO: 989 


tyros segment 10 


90 nts 


SEQ ID NO: 990 


Polypeptide encoded by SEQ ID NO: 989 


30 aa 


SEQ ID NO: 991 


tyros segment 1 1 


90 nts 


SEQ ED NO: 992 j 


Polypeptide encoded by SEQ ID NO: 991 

1 


30 aa 


SEQ ID NO: 993 


tyros segment 12 


90 nts 


SEQ ID NO: 994 


Polypeptide encoded by SEQ ID NO: 993 


30 aa 


SEQ ED NO: 995 


tyros segment 13 


90 nts 


SEQ ED NO: 996 


Polypeptide encoded by SEQ ID NO: 995 


30 aa 


SEQ ED NO: 997 


tyros segment 14 


90 nts 


SEQ ED NO: 998 


Polypeptide encoded by SEQ ID NO: 997 


30 aa 


SEQ ED NO: 999 


tyros segment 15 


90 nts 


SEQ ED NO: 1000 


Polypeptide encoded by SEQ ID NO: 999 


30 aa 


SEQ ED NO: 1001 


tyros segment 16 


90 nts 


SEQ ID NO: 1002 


Polypeptide encoded by SEQ ID NO: 1001 


30 aa 


SEQ ID NO: 1003 


tyros segment 17 


90 nts j 


SEQ ED NO: 1004 


Polypeptide encoded by SEQ ID NO: 1003 


30 aa 


SEQ ID NO: 1005 


tyros segment 1 8 


90 nts 


SEQ ED NO: 1006 


Polypeptide encoded by SEQ ID NO: 1005 


30 aa | 



• 
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SEQ ID NO: 1007 
SEQIDNO: 1008 
SEQIDNO: 1009 
SEQIDNO: 1010 
SEQIDNO: 1011 
SEQIDNO: 1012 
SEQIDNO: 1013 
SEQIDNO: 1014 
SEQ ID NO: 1015 
SEQIDNO: 1016 
SEQIDNO: 1017 
SEQIDNO: 1018 
SEQIDNO: 1019 
SEQIDNO: 1020 
SEQIDNO: 1021 
SEQIDNO: 1022 
SEQIDNO: 1023 
SEQIDNO: 1024 
SEQIDNO: 1025 
SEQIDNO: 1026 
SEQIDNO: 1027 
SEQIDNO: 1028 
SEQIDNO: 1029 
SEQIDNO: 1030 



tyros segment 1 9 

Polypeptide encoded by SEQ ED NO: 1007 
tyros segment 20 

Polypeptide encoded by SEQ ID NO: 1009 
tyros segment 21 

Polypeptide encoded by SEQ ID NO: 1011 
tyros segment 22 

Polypeptide encoded by SEQ ID NO: 1013 
tyros segment 23 

Polypeptide encoded by SEQ ED NO: 1015 
tyros segment 24 

Polypeptide encoded by SEQ ID NO: 1017 
tyros segment 25 

Polypeptide encoded by SEQ ID NO: 1019 
tyros segment 26 

Polypeptide encoded by SEQ ED NO: 1 021 
tyros segment 27 

Polypeptide encoded by SEQ ID NO: 1023 
tyros segment 28 

Polypeptide encoded by SEQ ID NO: 1025 
tyros segment 29 

Polypeptide encoded by SEQ ID NO: 1027 
tyros segment 30 

Polypeptide encoded by SEQ ID NO: 1029 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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mm 


■ 1 


K3BI 


SEQIDNO: 1031 


tyros segment 31 


90 nts 


SEQIDNO: 1032 


Polypeptide encoded by SEQ ID NO: 1031 


30 aa 


SEQIDNO: 1033 


tyros segment 32 


90 nts 


SEQIDNO: 1034 


Polypeptide encoded by SEQ ID NO: 1033 


30 aa 


SEQIDNO: 1035 


tyros segment 33 


90 nts 


SEQIDNO: 1036 


Polypeptide encoded by SEQ ED NO: 1035 


30 aa 


SEQIDNO: 1037 


tyros segment 34 


90 nts 


SEQIDNO: 1038 


Polypeptide encoded by SEQ ID NO: 1037 


30 aa 


SEQIDNO: 1039 


tyros segment 35 


69 nts 


SEQIDNO: 1040 

1 


Polypeptide encoded by SEQ ED NO: 1039 


23 aa 


SEQIDNO: 1041 


trp2 segment 1 


90 nts 


SEQIDNO: 1042 


Polypeptide encoded by SEQ ID NO: 1041 


30 aa 


SEQIDNO: 1043 


trp2 segment 2 


90 nts 


SEQIDNO: 1044 


Polypeptide encoded by SEQ ID NO: 1043 


30 aa 


SEQIDNO: 1045 


trp2 segment 3 


90 nts 


SEQIDNO: 1046 


Polypeptide encoded by SEQ ID NO: 1045 


30 aa 


SEQIDNO: 1047 


trp2 segment 4 


90 nts 


SEQIDNO: 1048 


Polypeptide encoded by SEQ ID NO: 1047 


30 aa 


SEQIDNO: 1049 


trp2 segment 5 


90 nts 


SEQIDNO: 1050 


Polypeptide encoded by SEQ ID NO: 1 049 


30 aa 


SEQIDNO: 1051 


trp2 segment 6 


90 nts 


SEQIDNO: 1052 


Polypeptide encoded by SEQ ID NO: 1051 


30 aa 


SEQIDNO: 1053 


trp2 segment 7 


90 nts 


SEQIDNO: 1054 


Polypeptide encoded by SEQ ID NO: 1053 


30 aa 
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gmwM 


SEQ ID NO: 1055 


trp2 segment 8 


90 nts 


SEQ ED NO: 1056 


Polypeptide encoded by SEQ ID NO: 1055 


30 aa 


SEQ ID NO: 1057 


trp2 segment 9 


90 nts 


SEQ ED NO: 1058 


Polypeptide encoded by SEQ ID NO: 1057 


30 aa 


SEQ ID NO: 1059 


trp2 segment 10 


90 nts 


SEQ ID NO: 1060 


Polypeptide encoded by SEQ ID NO: 1059 


30 aa 


SEQ ID NO: 1061 


trp2 segment 1 1 


90 nts 


SEQ ED NO: 1062 


Polypeptide encoded by SEQ ID NO: 1061 


30 aa 


SEQ ID NO: 1063 


trp2 segment 12 


90 nts 


SEQ ID NO: 1064 


Polypeptide encoded by SEQ ID NO: 1063 


30 aa 


SEQ ED NO: 1065 


trp2 segment 1 3 


90 nts 


SEQ ED NO: 1066 


Polypeptide encoded by SEQ ID NO: 1065 


30 aa 


SEQ ED NO: 1067 


trp2 segment 14 


90 nts 


SEQ ED NO: 1068 


Polypeptide encoded by SEQ ID NO: 1067 


30 aa 


SEQ ED NO: 1069 


trp2 segment 15 


90 nts 


SEQ ED NO: 1070 


Polypeptide encoded by SEQ ID NO: 1069 


30 aa 


SEQ ID NO: 1071 


tip2 segment 16 


90 nts 


SEQ ED NO: 1072 


Polypeptide encoded by SEQ ID NO: 1071 


30 aa 


SEQ ED NO: 1073 


trp2 segment 1 7 


90 nts 


SEQ ED NO: 1074 


Polypeptide encoded by SEQ ID NO: 1073 


30 aa 


SEQ ED NO: 1075 


trp2 segment 1 8 


90 nts 


SEQ ED NO: 1076 


Polypeptide encoded by SEQ ID NO: 1075 


30 aa 


SEQ ED NO: 1077 


trp2 segment 1 9 


90 nts 


SEQ ED NO: 1078 


Polypeptide encoded by SEQ ID NO: 1077 


30 aa 
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WQUWMClE "" " X - 




SEQ ED NO: 1079 


trp2 segment 20 


90 nts 


SEQIDNO: 1080 


Polypeptide encoded by SEQ ID NO: 1 079 


30 aa x 


SEQ ID NO: 1081 


trp2 segment 21 


90 nts 


SEQIDNO: 1082 


Polypeptide encoded by SEQ ID NO: 1081 


30 aa 


SEQIDNO: 1083 


trp2 segment 22 


90 nts 


SEQIDNO: 1084 


Polypeptide encoded by SEQ ID NO: 1083 


30 aa 


SEQIDNO: 1085 


trp2 segment 23 


90 nts 


SEQIDNO: 1086 


Polypeptide encoded by SEQ ID NO: 1085 


30 aa 


SEQIDNO: 1087 


trp2 segment 24 


90 nts 


SEQIDNO: 1088 


Polypeptide encoded by SEQ ID NO: 1087 


30 aa 


SEQIDNO: 1089 


trp2 segment 25 


90 nts 


SEQIDNO: 1090 


Polypeptide encoded by SEQ ID NO: 1089 


30 aa 


SEQIDNO: 1091 


trp2 segment 26 


90 nts 


SEQIDNO: 1092 


Polypeptide encoded by SEQ ID NO: 1091 


30 aa 


SEQIDNO: 1093 


trp2 segment 27 


90 nts j 


SEQ ID NO: 1094 


Polypeptide encoded by SEQ ID NO: 1093 


30 aa 


SEQIDNO: 1095 


trp2 segment 28 


90 nts 


SEQIDNO: 1096 


Polypeptide encoded by SEQ ID NO: 1095 


30 aa 


SEQIDNO: 1097 


trp2 segment 29 


90 nts 


SEQIDNO: 1098 


Polypeptide encoded by SEQ ID NO: 1097 


30 aa 


SEQIDNO: 1099 


trp2 segment 30 


90 nts 


SEQIDNO: 1100 


Polypeptide encoded by SEQ ID NO: 1 099 


30 aa 


SEQIDNO: 1101 


trp2 segment 3 1 


90 nts 


SEQIDNO: 1102 


Polypeptide encoded by SEQ ID NO: 1101 


30 aa 
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B 


SEQIDNO: 1103 


trp2 segment 32 


90 nts 


SEQIDNO: 1104 


Polypeptide encoded by SEQ ID NO: 1 103 


30 aa 


SEQIDNO: 1105 


trp2 segment 33 


90 nts 


SEQIDNO: 1106 


Polypeptide encoded by SEQ ID NO: 1 105 


30 aa 


SEQIDNO: 1107 


trp2 segment 34 


84 nts 


SEQIDNO: 1108 


Polypeptide encoded by SEQ ID NO: 1 1 07 


28 aa 


SEQIDNO: 1109 


MC1R segment 1 


90 nts 


SEQIDNO: 1110 


Polypeptide encoded by SEQ ID NO: 1 109 


30 aa 


SEQ ID NO: 1 1 1 1 


MC1R segment 2 


90 nts 


SEQIDNO: 1112 


Polypeptide encoded by SEQ ID NO: 1111 


30 aa 


SEQIDNO: 1113 


MC1R segment 3 


90 nts 


SEQIDNO: 1114 


Polypeptide encoded by SEQ ID NO: 1 1 1 3 


30 aa ! 


SEQIDNO: 1115 


MC1R segment 4 


90 nts 


SEQIDNO: 1116 


Polypeptide encoded by SEQ ID NO: 1115 


30 aa \ 


SEQIDNO: 1117 


MC1R segment 5 


90 nts 


SEQIDNO: 1118 


Polypeptide encoded by SEQ ID NO: 1117 


30 aa 


SEQIDNO: 1119 


MC1R segment 6 


90 nts 


SEQIDNO: 1120 


Polypeptide encoded by SEQ ID NO: 1119 


30 aa 


SEQIDNO: 1121 


MC1R segment 7 


90 nts 


SEQIDNO: 1122 


Polypeptide encoded by SEQ ID NO: 1121 


30 aa ! 


SEQIDNO: 1123 


MC1R segment 8 


90 nts 


SEQIDNO: 1124 


Polypeptide encoded by SEQ ID NO: 1 123 


30 aa 


SEQIDNO: 1125 


MC1R segment 9 


90 nts 


SEQIDNO: 1126 


Polypeptide encoded by SEQ ID NO: 1 125 


30 aa 
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SEQ ID NO: 1127 


MC1R segment 10 


90 nts 


SEQIDNO: 1128 


Polypeptide encoded by SEQ ID NO: 1 127 


30 aa 


SEQ ID NO: 1129 


MC1R segment 11 


90 nts 


SEQIDNO: 1130 


Polypeptide encoded by SEQ ID NO: 1 1 29 


30 aa 


SEQIDNO: 1131 


MC1R segment 12 


90 nts 


SEQIDNO: 1132 


Polypeptide encoded by SEQ ID NO: 1131 


30 aa 


SEQIDNO: 1133 


MC1R segment 13 


90 nts 


SEQIDNO: 1134 


Polypeptide encoded by SEQ ED NO: 1 1 33 


30 aa 


SEQIDNO: 1135 


MC1R segment 14 


90 nts 


SEQIDNO: 1136 

i 


Polypeptide encoded by SEQ ID NO: 1 135 


30 aa 


SEQIDNO: 1137 


MC1R segment 15 


90 nts 


SEQIDNO: 1138 


Polypeptide encoded by SEQ ID NO: 1 1 37 


30 aa 


SEQIDNO: 1139 


MC1R segment 16 


90 nts 


SEQIDNO: 1140 


Polypeptide encoded by SEQ ID NO: 1 139 


30 aa 


SEQIDNO: 1141 


MC1R segment 17 


90 nts 


SEQIDNO: 1142 


Polypeptide encoded by SEQ ID NO: 1141 


30 aa 


SEQIDNO: 1143 


MC1R segment 18 


90 nts 


SEQIDNO: 1144 


Polypeptide encoded by SEQ ID NO: 1 143 


30 aa 


SEQIDNO: 1145 


MC1R segment 19 


90 nts 


SEQIDNO: 1146 


Polypeptide encoded by SEQ ID NO: 1 145 


30 aa 


SEQIDNO: 1147 


MC1R segment 20 


90 nts 


SEQIDNO: 1148 


Polypeptide encoded by SEQ ID NO: 1 147 


30 aa 


SEQIDNO: 1149 


MC1R segment 21 


63 nts 


SEQIDNO: 1150 


Polypeptide encoded by SEQ ID NO: 1 149 


21 aa 
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SEQ ID NO: 1151 


MUC1F segment 1 


90 nts 


SEQIDNO: 1152 


Polypeptide encoded by SEQ ID NO: 1151 


30 aa 


SEQ ID NO: 1153 


MUC1F segment 2 


90 nts 


SEQIDNO: 1154 


Polypeptide encoded by SEQ ID NO: 1 153 


30 aa 


SEQIDNO: 1155 


MUC IF segment 3 


90 nts 


SEQIDNO: 1156 


Polypeptide encoded by SEQ ID NO: 1 1 55 


30 aa 


SEQIDNO: 1157 


MUC IF segment 4 


90 nts 


SEQIDNO: 1158 


Polypeptide encoded by SEQ ID NO: 1 157 


30 aa 


SEQIDNO: 1159 


MUC IF segment 5 


90 nts 


SEQIDNO: 1160 


Polypeptide encoded by SEQ ID NO: 1 1 59 


30 aa 


SEQIDNO: 1161 


MUC IF segment 6 


90 nts 


SEQIDNO: 1162 


Polypeptide encoded by SEQ ID NO: 1161 


30 aa 


SEQIDNO: 1163 


MUC IF segment 7 


90 nts 


SEQIDNO: 1164 


Polypeptide encoded by SEQ ID NO: 1 163 


30 aa 


SEQIDNO: 1165 


MUC IF segment 8 


72 nts 


SEQIDNO: 1166 


Polypeptide encoded by SEQ ID NO: 1 165 


24 aa 


SEQIDNO: 1167 


MUC 1R segment 1 


90 nts 


SEQIDNO: 1168 


Polypeptide encoded by SEQ ID NO: 1 167 


30 aa 


SEQIDNO: 1169 


MUC1R segment 2 


90 nts 


SEQIDNO: 1170 


Polypeptide encoded by SEQ ID NO: 1 169 


30 aa 


SEQIDNO: 1171 


MUC 1R segment 3 


90 nts 


SEQIDNO: 1172 


Polypeptide encoded by SEQ ID NO: 1 1 7 1 


30 aa 


SEQIDNO: 1173 


MUC1R segment 4 


90 nts 


SEQIDNO: 1174 


Polypeptide encoded by SEQ ID NO: 1 173 


30 aa 
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SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ED NO 
SEQ ED NO 
SEQ ED NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 
SEQ ID NO 



1175 
1176 
1177 
1178 
1179 
1180 
1181 
1182 
1183 
1184 
1185 
1186 
1187 
1188 
1189 
1190 
1191 
1192 
1193 
1194 
1195 
1196 
1197 
1198 



MUC1R segment 5 

Polypeptide encoded by SEQ ID NO: 1 175 
MUC1R segment 6 

Polypeptide encoded by SEQ ID NO: 1 177 
MUC1R segment 7 

Polypeptide encoded by SEQ ID NO. 1 179 
MUC1R segment 8 

Polypeptide encoded by SEQ ID NO. 1181 
MUC1R segment 9 

Polypeptide encoded by SEQ ID NO: 1 1 83 
MUC1R segment 10 

Polypeptide encoded by SEQ ED NO: 1 185 
MUC1R segment 11 

Polypeptide encoded by SEQ ED NO: 1 187 
MUC1R segment 12 

Polypeptide encoded by SEQ ED NO: 1 189 
MUC1R segment 13 

Polypeptide encoded by SEQ ED NO: 1191 
MUC1R segment 14 

Polypeptide encoded by SEQ ED NO: 1 1 93 
MUC1R segment 15 

Polypeptide encoded by SEQ ED NO: 1 195 
MUC1R segment 16 

Polypeptide encoded by SEQ ED NO: 1 197 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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■■unpin 


SEQ ID NO: 1199 


MUC1R segment 17 


90 nts 


SEQ ID NO: 1200 


Polypeptide encoded by SEQ ID NO: 1 199 


30 aa 


SEQ ID NO: 1201 


MUC1R segment 18 


90 nts 


SEQ ID NO: 1202 


Polypeptide encoded by SEQ ID NO: 1201 


30 aa 


SEQ ID NO: 1203 


MUC1R segment 19 


90 nts 


SEQ ID NO: 1204 


Polypeptide encoded by SEQ ID NO: 1203 


30 aa 


SEQ ID NO: 1205 


MUC1R segment 20 


90 nts 


SEQ ID NO: 1206 


Polypeptide encoded by SEQ ID NO: 1205 


30 aa 


SEQ ID NO: 1207 


MUC1R segment 21 


48 nts 


SEQ ID NO: 1208 


Polypeptide encoded by SEQ ID NO: 1207 


1 6 aa 


SEQ ID NO: 1209 


Differentiation Savine 


16638 nts 


SEQ ID NO: 1210 


Polypeptide encoded by SEQ ID NO: 1209 


5546 aa 


SEQ ID NO: 1211 


BAGE segment 1 


90 nts 


SEQ ID NO: 1212 


Polypeptide encoded by SEQ ID NO: 121 1 


30 aa 


SEQ ID NO: 1213 


BAGE segment 2 


90 nts 


SEQ ID NO: 1214 


Polypeptide encoded by SEQ ID NO: 1213 


30 aa 


SEQ ID NO: 1215 


BAGE segment 3 


51 nts 


SEQ ID NO: 1216 


Polypeptide encoded by SEQ ID NO: 1215 


17 aa 


SEQ ID NO: 1217 


GAGE-1 segment 1 


90 nts 


SEQ ID NO: 1218 


Polypeptide encoded by SEQ ID NO: 1217 


30 aa 


SEQ ID NO: 1219 


GAGE-1 segment 2 


90 nts 


SEQ ID NO: 1220 


Polypeptide encoded by SEQ ID NO: 1219 


30 aa 


SEQ ID NO: 1221 


GAGE-1 segment 3 


90 nts 


SEQ ID NO: 1222 


Polypeptide encoded by SEQ ID NO: 1 22 1 j 


30 aa 
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SEQIDNO: 1223 


GAGE-1 segment 4 


90 nts 


SEQIDNO: 1224 


Polypeptide encoded by SEQ ED NO: 1223 


30 aa 


SEQIDNO: 1225 


GAGE-1 segment 5 


90 nts 


SEQIDNO: 1226 


Polypeptide encoded by SEQ ID NO: 1225 


30 aa 


SEQIDNO: 1227 


GAGE-1 segments 


90 nts 


SEQIDNO: 1228 


Polypeptide encoded by SEQ ID NO: 1227 


30 aa 


SEQ ED NO: 1229 


GAGE-1 segment 7 


90 nts 


SEQIDNO: 1230 


Polypeptide encoded by SEQ ID NO: 1229 


30 aa 


SEQIDNO: 1231 


GAGE-1 segment 8 


90 nts 


SEQIDNO: 1232 

1 


Polypeptide encoded by SEQ ID NO: 1231 


30 aa 


SEQIDNO: 1233 1 


GAGE-1 segment 9 


66 nts 


SEQIDNO: 1234 


Polypeptide encoded by SEQ ID NO: 1233 


22 aa 


SEQIDNO: 1235 


gpl001n4 segment 1 


90 nts 


SEQIDNO: 1236 


Polypeptide encoded by SEQ ID NO: 1235 


30 aa 


SEQIDNO: 1237 


gpl001n4 segment 2 


90 nts 


SEQIDNO: 1238 


Polypeptide encoded by SEQ ID NO: 1237 


30 aa 


SEQIDNO: 1239 


gpl001n4 segment 3 


75 nts 


SEQIDNO: 1240 


Polypeptide encoded by SEQ ID NO: 1239 


25 aa 


SEQIDNO: 1241 


MAGE-1 segment 1 


90 nts 


SEQIDNO: 1242 


Polypeptide encoded by SEQ ID NO: 1241 


30 aa 


SEQIDNO: 1243 


MAGE-1 segment 2 


90 nts 


SEQIDNO: 1244 


Polypeptide encoded by SEQ ID NO: 1243 


30 aa 


SEQIDNO: 1245 


MAGE-1 segment 3 


90 nts 


SEQIDNO: 1246 


Polypeptide encoded by SEQ ID NO: 1245 


30 aa 



WO 01/090197 



PCT/AU01/00622 



-69- 
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mm 


SEQ ID NO: 1247 


MAGE-1 segment 4 


90 nts 


SEQJDNO: 1248 


Polypeptide encoded by SEQ ID NO: 1247 


30 aa 


SEQ ID NO: 1249 


MAGE-1 segment 5 


90 nts 


SEQ ID NO: 1250 


Polypeptide encoded by SEQ ID NO: 1249 


30 aa 


SEQ ID NO: 1251 


MAGE-1 segment 6 


90 nts 


SEQ ID NO: 1252 


Polypeptide encoded by SEQ ID NO: 1251 


30 aa 


SEQ ID NO: 1253 


MAGE-1 segment 7 


90 nts 


SEQ ID NO: 1254 


Polypeptide encoded by SEQ ED NO: 1 253 


30 aa 


SEQ ED NO: 1255 


MAGE-1 segment 8 


90 nts 


SEQ ID NO: 1256 


Polypeptide encoded by SEQ ID NO: 1255 


30 aa 


SEQ ID NO: 1257 


MAGE-1 segment 9 


90 nts 


SEQ ID NO: 1258 


Polypeptide encoded by SEQ ID NO: 1257 


30 aa 


SEQ ID NO: 1259 


MAGE-1 segment 10 


90 nts 


SEQ ID NO: 1260 


Polypeptide encoded by SEQ ID NO: 1259 


30 aa 


SEQ ED NO: 1261 


MAGE-1 segment 11 


90 nts 


SEQ ED NO: 1262 


Polypeptide encoded by SEQ ID NO: 1261 


30 aa 


SEQ ED NO: 1263 


MAGE-1 segment 12 


90 nts 


SEQ ED NO: 1264 


Polypeptide encoded by SEQ ID NO: 1263 


30 aa 


SEQ ID NO: 1265 


MAGE-1 segment 13 


90 nts 


SEQ ED NO: 1266 


Polypeptide encoded by SEQ ID NO: 1265 


30 aa | 


SEQ ID NO: 1267 


MAGE-1 segment 14 


90 nts 

i 


SEQ ED NO: 1268 


Polypeptide encoded by SEQ ID NO: 1267 


30 aa 


SEQ ID NO: 1269 


MAGE-1 segment 15 


90 nts 


SEQ ID NO: 1270 


Polypeptide encoded by SEQ ID NO: 1269 


30 aa 
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SEQIDNO: 1271 


MAGE-1 segment 16 


90 nts 


SEQIDNO: 1272 


Polypeptide encoded by SEQ ID NO: 1271 


30 aa 


SEQIDNO: 1273 


MAGE-1 segment 17 


90 nts 


SEQIDNO: 1274 


Polypeptide encoded by SEQ ID NO: 1273 


30 aa 


SEQIDNO: 1275 


MAGE-1 segment 18 


90 nts 


SEQIDNO: 1276 


Polypeptide encoded by SEQ ID NO: 1275 


30 aa 


SEQIDNO: 1277 


MAGE-1 segment 19 


90 nts 


SEQIDNO: 1278 


Polypeptide encoded by SEQ ID NO: 1277 


30 aa 


SEQIDNO: 1279 


MAGE-1 segment 20 


84 nts 


SEQIDNO: 1280 


Polypeptide encoded by SEQ ID NO: 1279 


28 aa 


SEQIDNO: 1281 1 


MAGE-3 segment 1 


90 nts 


SEQIDNO: 1282 


Polypeptide encoded by SEQ ID NO: 1281 


30 aa 


SEQIDNO: 1283 


MAGE-3 segment 2 


90 nts 


SEQIDNO: 1284 


Polypeptide encoded by SEQ ID NO: 1283 


30 aa 


SEQIDNO: 1285 


MAGE-3 segment 3 


90 nts 


SEQIDNO: 1286 


Polypeptide encoded by SEQ ID NO: 1285 


30 aa 


SEQ ID NO: 1287 


MAGE-3 segment 4 


90 nts 


SEQIDNO: 1288 


Polypeptide encoded by SEQ ID NO: 1287 


30 aa 


SEQIDNO: 1289 


MAGE-3 segment 5 


90 nts 


SEQIDNO: 1290 


Polypeptide encoded by SEQ ID NO: 1289 


30 aa 


SEQIDNO: 1291 


MAGE-3 segment 6 


90 nts 


SEQIDNO: 1292 


Polypeptide encoded by SEQ ID NO: 1 291 j 


30 aa 


SEQIDNO: 1293 


MAGE-3 segment 7 


90 nts 


SEQIDNO: 1294 


Polypeptide encoded by SEQ ID NO: 1293 


30 aa 
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SEQ ID NO: 1295 
SEQIDNO: 1296 
SEQ ID NO: 1297 
SEQIDNO: 1298 
SEQIDNO: 1299 
SEQIDNO: 1300 
SEQIDNO: 1301 
SEQIDNO: 1302 
SEQIDNO: 1303 
SEQIDNO: 1304 
SEQIDNO: 1305 
SEQIDNO: 1306 
SEQIDNO: 1307 
SEQIDNO: 1308 
SEQIDNO: 1309 
SEQIDNO: 1310 
SEQIDNO: 1311 
SEQIDNO: 1312 
SEQIDNO: 1313 
SEQIDNO: 1314 
SEQIDNO: 1315 
SEQIDNO: 1316 
SEQIDNO: 1317 
SEQIDNO: 1318 



MAGE-3 segment 8 

Polypeptide encoded by SEQ ID NO: 1295 
MAGE-3 segment 9 

Polypeptide encoded by SEQ ID NO: 1 297 
MAGE-3 segment 10 

Polypeptide encoded by SEQ ID NO: 1299 
MAGE-3 segment 1 1 

Polypeptide encoded by SEQ ID NO: 1301 
MAGE-3 segment 12 

Polypeptide encoded by SEQ ID NO: 1303 
MAGE-3 segment 1 3 

Polypeptide encoded by SEQ ID NO: 1305 
MAGE-3 segment 14 

Polypeptide encoded by SEQ ID NO: 1307 
MAGE-3 segment 1 5 

Polypeptide encoded by SEQ ID NO: 1 309 
MAGE-3 segment 16 

Polypeptide encoded by SEQ ID NO: 1311 
MAGE-3 segment 17 

Polypeptide encoded by SEQ ID NO: 1313 
MAGE-3 segment 18 

Polypeptide encoded by SEQ ED NO: 1315 
MAGE-3 segment 19 

Polypeptide encoded by SEQ ID NO: 1317 



90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
90 nts 
30 aa 
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SEQIDNO: 1319 


MAGE-3 segment 20 


90 nts 


SEQIDNO: 1320 


Polypeptide encoded by SEQ ID NO: 1319 


30 aa 


SEQIDNO: 1321 


MAGE-3 segment 21 


54 nts 


SEQIDNO: 1322 


Polypeptide encoded by SEQ ID NO: 1321 


18 aa 


SEQIDNO: 1323 


PRAME segment 1 


90 nts 


SEQIDNO: 1324 


Polypeptide encoded by SEQ ID NO: 1323 


30 aa 


SEQIDNO: 1325 


PRAME segment 2 


90 nts 


SEQIDNO: 1326 


Polypeptide encoded by SEQ ID NO: 1325 


30 aa 


SEQIDNO: 1327 


PRAME segment 3 


90 nts 


SEQIDNO: 1328 
SEQIDNO: 1329 ^ 


Polypeptide encoded by SEQ ID NO: 1327 


30 aa 


PRAME segment 4 


90 nts 


SEQIDNO: 1330 


Polypeptide encoded by SEQ ID NO: 1 329 


30 aa 


SEQIDNO: 1331 


PRAME segment 5 


90 nts 


SEQIDNO: 1332 


Polypeptide encoded by SEQ ID NO: 1331 


30 aa 


SEQIDNO: 1333 


PRAME segment 6 


90 nts 


SEQIDNO: 1334 


Polypeptide encoded by SEQ ID NO: 1333 


30 aa 


SEQIDNO: 1335 


PRAME segment 7 


90 nts 


SEQIDNO: 1336 


Polypeptide encoded by SEQ ID NO: 1335 


30 aa 


SEQIDNO: 1337 


PRAME segment 8 


90 nts 


SEQIDNO: 1338 


Polypeptide encoded by SEQ ED NO: 1337 


30 aa 


SEQIDNO: 1339 


PRAME segment 9 


90 nts 


SEQIDNO: 1340 


Polypeptide encoded by SEQ ID NO: 1339 


30 aa 


SEQIDNO: 1341 


PRAME segment 10 


90 nts 


SEQIDNO: 1342 


Polypeptide encoded by SEQ ID NO: 1 341 


30 aa 
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SEQ CD NO: 1343 


PRAME segment 1 1 


90 nts 


SEQEDNO: 1344 


Polypeptide encoded by SEQ ID NO: 1343 


30 aa 


SEQ ED NO: 1345 


PRAME segment 12 


90 nts 


SEQEDNO: 1346 


Polypeptide encoded by SEQ ID NO: 1 345 


30 aa 


SEQ ID NO: 1347 


PRAME segment 13 


90 nts 


SEQ ID NO: 1348 


Polypeptide encoded by SEQ ID NO: 1347 


30 aa 


SEQEDNO: 1349 


PRAME segment 14 


90 nts 


SEQEDNO: 1350 


Polypeptide encoded by SEQ ID NO: 1349 


30 aa 


SEQ ID NO: 1351 


PRAME segment 15 


90 nts 


SEQEDNO: 1352 


Polypeptide encoded by SEQ ID NO: 1351 


30 aa 


SEQEDNO: 1353 


PRAME segment 16- 


90 nts 


SEQEDNO: 1354 


Polypeptide encoded by SEQ ID NO: 1353 


30 aa 


SEQEDNO: 1355 


PRAME segment 17 


90 nts 


SEQEDNO: 1356 


Polypeptide encoded by SEQ ID NO: 1355 


30 aa 


SEQEDNO: 1357 


PRAME segment 18 


90 nts 


SEQEDNO: 1358 


Polypeptide encoded by SEQ ID NO: 1357 


30 aa 


SEQEDNO: 1359 


PRAME segment 19 


90 nts 


SEQEDNO: 1360 


Polypeptide encoded by SEQ ID NO: 1359 


30 aa 


SEQEDNO: 1361 


PRAME segment 20 


90 nts 


SEQEDNO: 1362 


Polypeptide encoded by SEQ ID NO: 1361 


30 aa 


SEQ ED NO. 1363 


PRAME segment 21 


90 nts 


SEQEDNO: 1364 


Polypeptide encoded by SEQ ID NO: 1 363 


30 aa 


SEQEDNO: 1365 


PRAME segment 22 


90 nts 


SEQEDNO: 1366 


Polypeptide encoded by SEQ ID NO: 1365 


30 aa 
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SEQIDNO: 1367 


PRAME segment 23 


90 nts 


SEQIDNO: 1368 


Polypeptide encoded by SEQ ID NO: 1367 


30 aa 


SEQIDNO: 1369 


PRAME segment 24 


90 nts 


SEQIDNO: 1370 


Polypeptide encoded by SEQ ID NO: 1369 


30 aa 


SEQIDNO: 1371 


PRAME segment 25 


90 nts 


SEQIDNO: 1372 


Polypeptide encoded by SEQ ID NO: 1 371 


30 aa 


SEQIDNO: 1373 


PRAME segment 26 


90 nts 


SEQIDNO: 1374 


Polypeptide encoded by SEQ ID NO: 1373 


30 aa 


SEQIDNO: 1375 


PRAME segment 27 


90 nts 


SEQIDNO: 1376 


Polypeptide encoded by SEQ ID NO: 1 375 


30 aa 


i 

SEQIDNO: 1377 1 


PRAME segment 28 


90 nts 


SEQIDNO: 1378 


Polypeptide encoded by SEQ ID NO: 1 377 


30 aa 


SEQIDNO: 1379 


PRAME segment 29 


90 nts 


SEQIDNO: 1380 


Polypeptide encoded by SEQ ID NO: 1379 


30 aa 


SEQIDNO: 1381 


PRAME segment 30 


90 nts 


SEQIDNO: 1382 


Polypeptide encoded by SEQ ID NO: 1381 


30 aa 


SEQIDNO: 1383 


PRAME segment 31 


90 nts 


SEQIDNO: 1384 


Polypeptide encoded by SEQ ID NO: 1383 


30 aa 


SEQIDNO: 1385 


PRAME segment 32 


90 nts 


SEQIDNO: 1386 


Polypeptide encoded by SEQ ID NO: 1385 


30 aa 


SEQIDNO: 1387 


PRAME segment 33 


90 nts 


SEQIDNO: 1388 


Polypeptide encoded by SEQ ID NO: 1387 


30 aa 


SEQIDNO: 1389 


PRAME segment 34 


54 nts 


SEQIDNO: 1390 


Polypeptide encoded by SEQ ID NO: 1 389 


18 aa 
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SEQ ED NO: 1391 


TRP21N2 segment 1 


90 nts 


SEQIDNO: 1392 


Polypeptide encoded by SEQ ID NO: 1391 


30 aa 


SEQ ID NO: 1393 


TRP21N2 segment 2 


90 nts 


SEQ ID NO: 1394 


Polypeptide encoded by SEQ ID NO: 1393 


30 aa 


SEQIDNO: 1395 


TRP21N2 segment 3 


84 nts 


SEQIDNO: 1396 


Polypeptide encoded by SEQ ID NO: 1395 


28 aa 


SEQIDNO: 1397 


NYNSOl a segment 1 


90 nts 


SEQ ID NO: 1398 


Polypeptide encoded by SEQ ID NO: 1397 


30 aa 


SEQIDNO: 1399 


NYNSOl a segment 2 


90 nts 


SEQIDNO: 1400 


Polypeptide encoded by SEQ ID NO: 1399 


30 aa 


SEQIDNO: 1401 


NYNSOl a segment 3 


90 nts 


SEQIDNO: 1402 


Polypeptide encoded by SEQ ID NO: 1401 


30 aa 


SEQIDNO: 1403 


NYNSOl a segment 4 


90 nts 


SEQIDNO: 1404 


Polypeptide encoded by SEQ ID NO: 1403 


30 aa 


SEQIDNO: 1405 


NYNSOl a segment 5 


90 nts 


SEQIDNO: 1406 


Polypeptide encoded by SEQ ED NO: 1405 


30 aa 


SEQIDNO: 1407 


NYNSOl a segment 6 


90 nts 


SEQ ID NO: 1408 


Polypeptide encoded by SEQ ID NO: 1407 


30 aa 


SEQ ID NO: 1409 


NYNSOl a segment 7 


90 nts 


SEQIDNO: 1410 


Polypeptide encoded by SEQ ID NO: 1409 


30 aa 


SEQIDNO: 1411 


NYNSOl a segment 8 


90 nts 


SEQIDNO: 1412 


Polypeptide encoded by SEQ ID NO: 1411 


30 aa 


SEQIDNO: 1413 


NYNSOl a segment 9 


90 nts 


SEQIDNO: 1414 


Polypeptide encoded by SEQ ID NO: 1413 


30 aa 
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SEQIDNO: 1415 


NYNSOla segment 10 


90 nts 


SEQIDNO: 1416 


Polypeptide encoded by SEQ ED NO: 1415 


30 aa 


SEQIDNO: 1417 


NYNSOla segment 11 


90 nts 


SEQIDNO: 1418 


Polypeptide encoded by SEQ ID NO: 1417 


30 aa 


SEQIDNO: 1419 


NYNSOla segment 12 


57 nts , 


SEQIDNO: 1420 


Polypeptide encoded by SEQ ID NO: 1419 


19 aa 


SEQIDNO: 1421 


NYNSOlb segment 1 


90 nts 


SEQIDNO: 1422 


Polypeptide encoded by SEQ ID NO: 1421 


30 aa 


SEQIDNO: 1423 


NYNSOlb segment 2 


90 nts 


SEQIDNO: 1424 


Polypeptide encoded by SEQ ID NO: 1423 


30 aa 


SEQIDNO: 1425 


NYNSOlb segment 3 


90 nts 


SEQIDNO: 1426 


Polypeptide encoded by SEQ ID NO: 1 425 


30 aa 


SEQIDNO: 1427 


NYNSO 1 b segment 4 


51 nts 


SEQIDNO: 1428 


Polypeptide encoded by SEQ ID NO: 1427 




SEQIDNO: 1429 


LAGE1 segment 1 


90 nts 


SEQIDNO: 1430 


Polypeptide encoded by SEQ ID NO: 1429 


30 aa 


SEQIDNO: 1431 


LAGE1 segment 2 


90 nts 


SEQIDNO: 1432 


Polypeptide encoded by SEQ ID NO: 1431 


30 aa 


SEQIDNO: 1433 


LAGE1 segment 3 


90 nts 


SEQIDNO: 1434 


Polypeptide encoded by SEQ ID NO: 1433 


30 aa 


SEQIDNO: 1435 


LAGE1 segment 4 


90 nts 


SEQIDNO: 1436 


Polypeptide encoded by SEQ ID NO: 1435 


30 aa 


SEQIDNO: 1437 


LAGE1 segment 5 


90 nts 


SEQIDNO: 1438 


Polypeptide encoded by SEQ ID NO: 1437 


30 aa 
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SEQ ID NO: 1439 


LAGE1 segment 6 


90 nts 


SEQIDNO: 1440 


Polypeptide encoded by SEQ ID NO: 1439 


30 aa 


SEQ ID NO: 1441 


LAGE1 segment 7 


90 nts 


SEQIDNO: 1442 


Polypeptide encoded by SEQ ID NO: 1441 


30 aa 


SEQ ID NO: 1443 


LAGE1 segment 8 


90 nts 


SEQIDNO: 1444 


Polypeptide encoded by SEQ ID NO: 1443 


30 aa 


SEQIDNO: 1445 


LAGE1 segment 9 


90 nts 


SEQIDNO: 1446 


Polypeptide encoded by SEQ ID NO: 1445 


30 aa 


SEQ ID NO: 1447 


LAGE1 segment 10 


90 nts 


SEQ ID NO: 1448 


Polypeptide encoded by SEQ ID NO: 1447 


30 aa 


SEQIDNO: 1449 


LAGE1 segment 11 


90 nts 


SEQIDNO: 1450 


Polypeptide encoded by SEQ ID NO: 1449 


30 aa 


SEQIDNO: 1451 


LAGE1 segment 12 


57 nts 


SEQIDNO: 1452 


Polypeptide encoded by SEQ ID NO: 1451 


19 aa i 


SEQIDNO: 1453 


Melanoma cancer specific Savine 


10623 nts 


SEQIDNO: 1454 


Polypeptide encoded by SEQ ID NO: 1453 


3541 aa 


SEQIDNO: 1455 


Figure 16 A1S1 99mer 


99 nts 


SEQIDNO: 1456 


Figure 16 A1S2 lOOmer 


100 nts 


SEQIDNO: 1457 


Figure 16 A 1 S3 lOOmer 


100 nts 


SEQIDNO: 1458 


Figure 16 A1S4 lOOmer 


100 nts 


SEQ ID NO: 1459 


Figure 16 A1S5 lOOmer 


100 nts 


SEQIDNO: 1460 


Figure 16 A1S6 99mer 


99 nts 


SEQ ED NO: 1461 


Figure 16 A1S7 97mer 


99 nts 


SEQIDNO: 1462 


Figure 16 A1S8 lOOmer 


100 nts 
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SEQIDNO: 1463 


Figure 16 A1S9 lOOmer 


1 Art 

1 00 nts | 


SEQ ED NO: 1464 


Figure 16 AlS10 75mer 


76 nts 


SEQ ED NO: 1465 


Figure 16 AlF20mer 


20 nts 


SEQ ED NO: 1466 


Figure 16 AIR 20mer 


20 nts 


SEQIDNO: 1467 


Amino acid sequence of immunostimulatory 
domain of an invasin protein from Yersinia spp. 


16 aa 
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DET AILED DESCRIPTION OF THE INVENTION 



J. Definitions 

The articles "a " and "an " are used herein to refer to one or to more than one (i.e., 
to at least one) of the grammatical object of the article. By way of example, "an element" 
5 means one element or more than one element. 

As used herein, the term "about*' refers to a quantity, level, value, dimension, 
size, or amount that varies by as much as 30%, preferably by as much as 20%, and more 
preferably by as much as 10% to a reference quantity, level, value, dimension, size, or 
amount. 

/ 

10 By "antigen-binding molecule" is meant a molecule that has binding affinity for a 

target antigen. It will be understood that this term extends to immunoglobulins, 
immunoglobulin fragments and non-immunoglobulin derived protein frameworks that 
exhibit antigen-binding activity. 

The term "clade" as used herein refers to a hypothetical species of an organism 
15 and its descendants or a monophyletic group of organisms. Clades carry a definition, based 
on ancestry, and a diagnosis, based on synapomorphies. It should be noted that diagnoses 
of clades could change while definitions do not. 

Throughout this specification, unless the context requires otherwise, the words 
"comprise", "comprises" and "comprising" will be understood to imply the inclusion of a 
20 stated step or element or group of steps or elements but not the exclusion of any other step 
or element or group of steps or elements. 

By "expression vector" is meant any autonomous genetic element capable of 
directing the synthesis of a protein encoded by the vector. Such expression vectors are 
known by practitioners in the art. 

25 As used herein, the term "function" refers to a biological, enzymatic, or 

therapeutic function. 
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" Homology" refers to the percentage number of amino acids that iare identical or 
constitute conservative substitutions as defined in Table B infra. Homology may be 
determined using sequence comparison programs such as GAP (Deveraux et aL 1984, 
Nucleic Acids Research 12, 387-395). In this way, sequences of a similar or substantially 
5 different length to those cited herein might be compared by insertion of gaps into the 
alignment, such gaps being determined, for example, by the comparison algorithm used by 
GAP. 

To enhance an immune response ("immunoenhancement"), as is well-known in 
the art, means to increase an animal's capacity to respond to foreign or disease-specific 

10 antigens (e.g., cancer antigens) i.e., those cells primed to attack such antigens are increased 
in number, activity, and ability to detect and destroy the those antigens. Strength of 
immune response is measured by standard tests including: direct measurement of 
peripheral blood lymphocytes by means known to the art; natural killer cell cytotoxicity 
assays (see, e.g., Provincial M. et al (1992, J. Immunol. Meth. 155: 19-24), cell 

15 proliferation assays (see, e.g., Vollenweider, I. and Groseurth, P. J. (1992, J. Immunol 
Meth. 149: 133-135), immunoassays of immune cells and subsets (see, e.g., Loeffler, D. 
A., et al. (1992, Cytom. 13: 169-174); Rivoltini, L., et al. (1992, Can. Immunol. 
Immunother. 34: 241-251); or skin tests for cell-mediated immunity (see, e.g., Chang, A. 
E. et al (1993, Cancer Res. 53: 1043-1050). Any statistically significant increase in 

20 strength of immune response as measured by the foregoing tests is considered "enhanced 
immune response" "immunoenhancement" or "immunopotentiation" as used herein. 
Enhanced immune response is also indicated by physical manifestations such as fever and 
inflammation, as well as healing of systemic and local infections, and reduction of 
symptoms in disease, i.e., decrease in tumour size, alleviation of symptoms of a disease or 

25 condition including, but not restricted to, leprosy, tuberculosis, malaria, naphthous ulcers, 
herpetic and papillomatous warts, gingivitis, artherosclerosis, the concomitants of AIDS 
such as Kaposi's sarcoma, bronchial infections, and the like. Such physical manifestations 
also define "enhanced immune response" "immunoenhancement" or 
"immunopotentiation " as used herein. 



30 



Reference herein to "immuno-inter active" includes reference to any interaction, 
reaction, or other form of association between molecules and in particular where one of the 
molecules is, or mimics, a component of the immune system. 
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By "isolated" is meant material that is substantially or essentially free from 
components that normally accompany it in its native state. 

By "modulating" is meant increasing or decreasing, either directly or indirectly, 
an immune response against a target antigen of a member selected from the group 
5 consisting of a cancer and an organism, preferably a pathogenic organism. 

By "natural gene " is meant a gene that naturally encodes a protein. 

The term "natural polypeptide " as used herein refers to a polypeptide that exists 
in nature. 

By "obtained from" is meant that a sample such as, for example, a polynucleotide 
10 extract or polypeptide extract is isolated from, or derived from, a particular source of the 
host. For example, the extract can be obtained from a tissue or a biological fluid isolated 
directly from the host. 

The term "oligonucleotide" as used herein refers to a polymer composed of a 
multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related 

15 structural variants or synthetic analogues thereof) linked via phosphodi ester bonds (or 
related structural variants or synthetic analogues thereof). Thus, while the term 
"oligonucleotide" typically refers to a nucleotide polymer in which the nucleotide residues 
and linkages between them are naturally occurring, it will be understood that the term also 
includes within its scope various analogues including, but not restricted to, peptide nucleic 

20 acids (PNAs), phosphoramidates, phosphorothioates, methyl phosphonates, 2-O-methyl 
ribonucleic acids, and the like. The exact size of the molecule can vary depending on the 
particular application. An oligonucleotide is typically rather short in length, generally from 
about 10 to 30 nucleotide residues, but the term can refer to molecules of any length, 
although the term "polynucleotide" or "nucleic acid" is typically used for large 

25 oligonucleotides. 

By "operably linked" is meant that transcriptional and translational regulatory 
polynucleotides are positioned relative to a polypeptide-encoding polynucleotide in such a 
manner that the polynucleotide is transcribed and the polypeptide is translated. 
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The term "parent polypeptide'' as used herein typically refers to a polypeptide 
encoded by a natural gene. However, it is possible that the parent polypeptide corresponds 
to a protein that is not naturally-occurring but has been engineered using recombinant 
techniques. In this instance, a polynucleotide encoding the parent polypeptide may 
5 comprise different but synonymous codons relative to a natural gene encoding the same 
polypeptide. Alternatively, the parent polypeptide may not correspond to a natural 
polypeptide sequence. For example, the parent polypeptide may comprise one or more 
consensus sequences common to a plurality of polypeptides. 

The term "patient" refers to patients of human or other mammal and includes any 
10 individual it is desired to examine or treat using the methods of the invention. However, it 
will be understood that "patient" does not imply that symptoms are present. Suitable 
mammals that fall within the scope of the invention include, but are not restricted to, 
primates, livestock animals {e.g., sheep, cows, horses, donkeys, pigs), laboratory test 
animals (e.g., rabbits, mice, rats, guinea pigs, hamsters), companion animals (e.g., cats, 
15 dogs) and captive wild animals (e.g., foxes, deer, dingoes). 

By "pharmaceutically-acceptable carrier" is meant a solid or liquid filler, diluent 
or encapsulating substance that can be safely used in topical or systemic administration to a 
mammal. 

"Polypeptide", "peptide" and "protein 9 are used interchangeably herein to refer to 
20 a polymer of amino acid residues and to variants and synthetic analogues of the same. 
Thus, these terms apply to amino acid polymers in which one or more amino acid residues 
is a synthetic non-naturally occurring amino acid, such as a chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally-occurring amino acid 
polymers. 

25 The term "polynucleotide" or "nucleic acid" as used herein designates mRNA, 

RNA, cRNA, cDNA or DNA. The term typically refers to oligonucleotides greater than 30 
nucleotide residues in length. 

By "primer" is meant an oligonucleotide which, when paired with a strand of 
DNA, is capable of initiating the synthesis of a primer extension product in the presence of 
30 a suitable polymerising agent. The primer is preferably single-stranded for maximum 
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efflciency in amplification but can alternatively be double-stranded. A primer must be 
sufficiently long to prime the synthesis of extension products in the presence of the 
polymerisation agent. The length of the primer depends on many factors, including 
application, temperature to be employed, template reaction conditions, other reagents, and 
5 source of primers. For example, depending on the complexity of the target sequence, the 
oligonucleotide primer typically contains 15 to 35 or more nucleotide residues, although it 
can contain fewer nucleotide residues. Primers can be large polynucleotides, such as from 
about 35 nucleotides to several kilobases or more. Primers can be selected to be 
"substantially complementary'' to the sequence on the template to which it is designed to 

10 hybridise and serve as a site for the initiation of synthesis. By "substantially 
complementary", it is meant that the primer is sufficiently complementary to hybridise 
with a target polynucleotide. Preferably, the primer contains no mismatches with the 
template to which it is designed to hybridise but this is not essential. For example, non- 
complementary nucleotide residues can be attached to the 5' end of the primer, with the 

15 remainder of the primer sequence being complementary to the template. Alternatively, 
non-complementary nucleotide residues or a stretch of non-complementary nucleotide 
residues can be interspersed into a primer, provided that the primer sequence has sufficient 
complementarity with the sequence of the template to hybridise therewith and thereby form 
a template for synthesis of the extension product of the primer. 

20 "Probe" refers to a molecule that binds to a specific sequence or sub-sequence or 

other moiety of another molecule. Unless otherwise indicated, the term "probe" typically 
refers to a polynucleotide probe that binds to another polynucleotide, often called the 
"target polynucleotide", through complementary base pairing. Probes can bind target 
polynucleotides lacking complete sequence complementarity with the probe, depending on 

25 the stringency of the hybridisation conditions. Probes can be labelled directly or indirectly. 

By "recombinant polypeptide" is meant a polypeptide made using recombinant 
techniques, i.e., through the expression of a recombinant or synthetic polynucleotide. 

Terms used to describe sequence relationships between two or more 
polynucleotides or polypeptides include "reference sequence", "comparison window", 
30 "sequence identity", "percentage of sequence identity" and "substantial identity". A 
"reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer 
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units, inclusive of nucleotides and amino acid residues, in length. Because two 
polynucleotides may each comprise (1) a sequence (i.e., only a portion of the complete 
polynucleotide sequence) that is similar between the two polynucleotides, and (2) a 
sequence that is divergent between the two polynucleotides, sequence comparisons 
5 between two (or more) polynucleotides are typically performed by comparing sequences of 
the two polynucleotides over a "comparison window" to identify and compare local 
regions of sequence similarity. A "comparison window" refers to a conceptual segment of 
at least 50 contiguous positions, usually about 50 to about 100, more usually about 100 to 
about 150 in which a sequence is compared to a reference sequence of the same number of 

10 contiguous positions after the two sequences are optimally aligned. The comparison 
window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared 
to the reference sequence (which does not comprise additions or deletions) for optimal 
alignment of the two sequences. Optimal alignment of sequences for aligning a comparison 
window may be conducted by computerised implementations of algorithms (GAP, 

15 BESTFIT, FASTA, arid TFASTA in the Wisconsin Genetics Software Package Release 
7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection 
and the best alignment (i.e., resulting in the highest percentage homology over the 
comparison window) generated by any of the various methods selected. Reference also 
may be made to the BLAST family of programs as for example disclosed by Altschul et 

20 al., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be 
found in Unit 19.3 of Ausubel et al., "Current Protocols in Molecular Biology", John 
Wiley & Sons Inc, 1994-1998, Chapter 15. 

The term "sequence identity" as used herein refers to the extent that sequences 
are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis 

25 over a window of comparison. Thus, a "percentage of sequence identity" is calculated by 
comparing two optimally aligned sequences over the window of comparison, determining 
the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the 
identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, lie, Phe, Tyr, Trp, Lys, 
Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number 

30 of matched positions, dividing the number of matched positions by the total number of 
positions in the window of comparison (i.e., the window size), and multiplying the result 
by 100 to yield the percentage of sequence identity. For the purposes of the present 
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invention, "sequence identity" will be understood to mean the "match percentage" 
calculated by the DNASIS computer program (Version 2.5 for windows; available from 
Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using 
standard defaults as used in the reference manual accompanying the software. 

5 The term "synthetic polynucleotide" as used herein refers to a polynucleotide 

formed in vitro by the manipulation of a polynucleotide into a form not normally found in 
nature. For example, the synthetic polynucleotide can be in the form of an expression 
vector. Generally, such expression vectors include transcriptional and translational 
regulatory polynucleotide operably linked to the polynucleotide. 

10 The term "synonymous codon " as used herein refers to a codon having a different 

nucleotide sequence than another codon but encoding the same amino acid as that other 
codon. 

By "translational efficiency" is meant the efficiency of a cell's protein synthesis 
machinery to incorporate the amino acid encoded by a codon into a nascent polypeptide 
15 chain. This efficiency can be evidenced, for example, by the rate at which the cell is able to 
synthesise the polypeptide from an RNA template comprising the codon, or by the amount 
of the polypeptide synthesised from such a template. 

By "vector" is meant a polynucleotide molecule, preferably a DNA molecule 
derived, for example, from a plasmid, bacteriophage, yeast or virus, into which a 

20 polynucleotide can be inserted or cloned. A vector preferably contains one or more unique 
restriction sites and can be capable of autonomous replication in a defined host cell 
including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with 
the genome of the defined host such that the cloned sequence is reproducible. Accordingly, 
the vector can be an autonomously replicating vector, i.e., a vector that exists as an 

25 extrachromosomal entity, the replication of which is independent of chromosomal 
replication, e.g., a linear or closed circular plasmid, an extrachromosomal element, a 
minichromosome, or an artificial chromosome. The vector can contain any means for 
assuring self-replication. Alternatively, the vector can be one which, when introduced into 
the host cell, is integrated into the genome and replicated together with the chromosome(s) 

30 into which it has been integrated. A vector system can comprise a single vector or plasmid, 
two or more vectors or plasmids, which together contain the total DNA to be introduced 
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into the genome of the host cell, or a transposon. The choice of the vector will typically 
depend on the compatibility of the vector with the host cell into which the vector is to be 
introduced. In the present case, the vector is preferably a viral or viral-derived vector, 
which is operably functional in animal and preferably mammalian cells. Such vector may 
5 be derived from a poxvirus, an adenovirus or yeast. The vector can also include a selection 
marker such as an antibiotic resistance gene that can be used for selection of suitable 
transformants Examples of such resistance genes are known to those of skill in the art and 
include the nptll gene that confers resistance to the antibiotics kanamycin and G418 
(Geneticin®) and the hph gene which confers resistance to the antibiotic hygromycin B. 
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2. Synthetic polypeptides 

The inventors have surprisingly discovered that the structure of a parent 
polypeptide can be disrupted sufficiently to impede, abrogate or otherwise alter at least one 
function of the parent polypeptide, while simultaneously minimising the destruction of 
5 potentially useful epitopes that are present in the parent polypeptide, by fusing, coupling or 
otherwise linking together different segments of the parent polypeptide in a different 
relationship relative to their linkage in the parent polypeptide. As a result of this change in 
relationship, the sequence of the linked segments in the resulting synthetic polypeptide is 
different to a sequence contained within the parent polypeptide. The synthetic polypeptides 
10 of the invention are useful as immunopotentiating agents, and are referred to elsewhere in 
the specification as scrambled antigen vaccines, super attenuated vaccines or "Savines". 

Thus, the invention broadly resides in a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein said segments 
are linked together in a different relationship relative to their linkage in the at least one 
1 5 parent polypeptide. 

It is preferable but not essential that the segments in said synthetic polypeptide are 
linked sequentially in a different order or arrangement relative to that of corresponding 
segments in said at least one parent polypeptide. For example, in the case of a parent 
polypeptide that comprises three contiguous or overlapping segments A-B-C-D, these 

20 segments may be linked in 23 other possible orders to form a synthetic polypeptide. These 
orders may be selected from the group consisting of: A-B-D-C, A-C-B-D, A-C-D-B, A-D- 
B-C, A-D-C-B, B-A-C-D, B-A-D-C, B-C-A-D, B-C-D-A, B-D-A-C, B-D-C-A, C-A-B-D, 
C-A-D-B, C-B-A-D, C-B-D-A, C-D-A-B, C-D-B-A, D-A-B-C, D-A-C-B, D-B-A-C, D-B- 
C-A, D-C-A-B, and D-C-B-A. Although the rearrangement of the segments is preferably 

25 random, it is especially preferable to exclude or otherwise minimise rearrangements that 
result in complete or partial reassembly of the parent sequence {e.g., ADBC, BACD, 
DABC). It will be appreciated, however, that the probability of such complete or partial 
reassembly diminishes as the number of segments for rearrangement increases. 

The order of the segments is suitably shuffled, reordered or otherwise rearranged 
30 relative to the order in which they exist in the parent polypeptide so that the structure of the 
polypeptide is disrupted sufficiently to impede, abrogate or otherwise alter at least one 
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fiinction associated with the parent polypeptide. Preferably, the segments of the parent 
polypeptide are randomly rearranged in the synthetic polypeptide. 

The parent polypeptide is suitably a polypeptide that is associated with a disease 
or condition. For example, the parent polypeptide may be a polypeptide expressed by a 
5 pathogenic organism or a cancer. Alternatively, the parent polypeptide can be a self 
peptide related to an autoimmune disease including, but are not limited to, diseases such as 
diabetes (e.g., juvenile diabetes), multiple sclerosis, rheumatoid arthritis, myasthenia 
gravis, atopic dermatitis, and psoriasis and ankylosing spondylitis. Accordingly, the 
synthetic molecules of the present invention may also have utility for the induction of 

10 tolerance in a subject afflicted with an autoimmune disease or condition or with an allergy 
or other condition to which tolerance is desired. For example tolerance may be induced by 
contacting an immature dendritic cell of the individual to be treated with a synthetic 
polypeptide of the invention or by expressing in an immature dendritic cell a synthetic 
polynucleotide of the invention. Tolerance may also be induced against antigens causing 

15 allergic responses (e.g., asthma, hay fever). In this case, the parent polypeptide is suitably 
an allergenic protein including, but not restricted to, house-dust-mite allergenic proteins as 
for example described by Thomas and Smith (1998, Allergy, 53(9): 821-832). 

The pathogenic organism includes, but is not restricted to, yeast, a virus, a 
bacterium, and a parasite. Any natural host of the pathogenic organism is contemplated by 

20 the present invention and includes, but is not limited to, mammals, avians and fish. In a 
preferred embodiment, the pathogenic organism is a virus, which may be an RNA virus or 
a DNA virus. Preferably, the RNA virus is Human Immunodeficiency Virus (HIV), 
Poliovirus, and Influenza virus, Rous sarcoma virus, or a Flavivirus such as Japanese 
encephalitis virus. In a preferred embodiment, the RNA virus is a Hepatitis virus including, 

25 but not limited to, Hepatitis strains A, B and C. Suitably, the DNA virus is a Herpesvirus 
including, but not limited to, Herpes simplex virus, Epstein-Barr virus, Cytomegalovirus 
and Parvovirus. In a preferred embodiment, the virus is HIV and the parent polypeptide is 
suitably selected from env, gag, pol, vif, vpr, tat, rev, vpu and nef, or combination thereof. 
In an alternate preferred embodiment, the virus is Hepatitis CI a virus and the parent 

30 polypeptide is the Hepatitis CI a virus polyprotein. 
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Ln another embodiment, the pathogenic organism is a bacterium, which includes, 
but is not restricted to, Neisseria species, Meningococcal species, Haemophilus species 
Salmonella species, Streptococcal species, Legionella species and Mycobacterium species. 

In yet another embodiment, the pathogenic organism is a parasite, which includes, 
5 but is not restricted to, Plasmodium species, Schistosoma species, Leishmania species, 
Trypanosoma species, Toxoplasma species and Giardia species. 

Any cancer or tumour is contemplated by the present invention. For example, the 
cancer or tumour includes, but is not restricted to, melanoma, lung cancer, breast cancer, 
cervical cancer, prostate cancer, colon cancer, pancreatic cancer, stomach cancer, bladder 

10 cancer, kidney cancer, post transplant lymphoproliferative disease (PTLD), Hodgkin's 
Lymphoma and the like. Preferably, the cancer or tumour relates to melanoma. In a 
preferred embodiment of this type, the parent polypeptide is a melanocyte differentiation 
antigen which is suitably selected from gplOO, MART, TRP-1, Tyros, TRP2, MC1R, 
MUC1F, MUC1R or a combination thereof. In an alternate preferred embodiment of this 

15 type, the parent polypeptide is a melanoma-specific antigen which is suitably selected from 
BAGE, GAGE-1, gpl001n4, MAGE-1, MAGE-3, FRAME, TRP2IN2, NYNSOla, 
NYNSOlb, LAGE1 or a combination thereof. 

In a preferred embodiment, the segments are selected on the basis of size. A 
segment according to the invention may be of any suitable size that can be utilised to elicit 

20 an immune response against an antigen encoded by the parent polypeptide. A number of 
factors can influence the choice of segment size. For example, the size of a segment should 
be preferably chosen such that it includes, or corresponds to the size of, T cell epitopes and 
their processing requirement. Practitioners in the art will recognise that class 1-restricted T 
cell epitopes can be between 8 and 10 amino acids in length and if placed next to unnatural 

25 flanking residues, such epitopes can generally require 2 to 3 natural flanking amino acids 
to ensure that they are efficiently processed and presented. Class H-restricted T cell 
epitopes can range between 12 and 25 amino acids in length and may not require natural 
flanking residues for efficient proteolytic processing although it is believed that natural 
flanking residues may play a role. Another important feature of class II-restricted epitopes 

30 is that they generally contain a core of 9-10 amino acids in the middle which bind 
specifically to class II MHC molecules with flanking sequences either side of this core 
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stabilising binding by associating with conserved structures on either side of class II MHC 
antigens in a sequence independent manner (Brown et aL, 1993). Thus the functional 
region of class II-restricted epitopes is typically less than 15 amino acids long. The size of 
linear B cell epitopes and the factors effecting their processing, like class II-restricted 
5 epitopes, are quite variable although such epitopes are frequently smaller in size than 15 
amino acids. From the foregoing, it is preferable, but not essential, that the size of the 
segment is at least 4 amino acids, preferably at least 7 amino acids, friore preferably at least 
12 amino acids, more preferably at least 20 amino acids and more preferably at least 30 
amino acids. Suitably, the size of the segment is less than 2000 amino acids, more 

10 preferably less than 1000 amino acids, more preferably less than 500 amino acids, more 
preferably less than 200 amino acids, more preferably less than 100 amino acids, more 
preferably less than 80 amino acids and even more preferably less than 60 amino acids and 
still even more preferably less than 40 amino acids. In this regard, it is preferable that the 
size of the segments is as small as possible so that the synthetic polypeptide adopts a 

15 functionally different structure relative to the structure of the parent polypeptide. It is also 
preferable that the size of the segments is large enough to minimise loss of T cell epitopes. 
In an especially preferred embodiment, the size of the segment is about 30 amino acids. 

An optional spacer may be utilised to space adjacent segments relative to each 
other. Accordingly, an optional spacer may be interposed between some or all of the 

20 segments. The spacer suitably alters proteolytic processing and/or presentation of adjacent 
segment(s). In a preferred embodiment of this type, the spacer promotes or otherwise 
enhances proteolytic processing and/or presentation of adjacent segment(s). Preferably, the 
spacer comprises at least one amino acid. The at least one amino acid is suitably a neutral 
amino acid. The neutral amino acid is preferably alanine. Alternatively, the at least one 

25 amino acid is cysteine. 

In a preferred embodiment, segments are selected such that they have partial 
sequence identity or homology with one or more other segments. Suitably, at one or both 
ends of a respective segment there is comprised at least 4 contiguous amino acids, 
preferably at least 7 contiguous amino acids, more preferably at least 10 contiguous amino 
30 acids, more preferably at least 15 contiguous amino acids and even more preferably at least 
20 contiguous amino acids that are identical to, or homologous with, an amino acid 
sequence contained within one or more other of said segments. Preferably, at the or each 
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end of a respective segment there is comprised less than 500 contiguous amino acids, more 
preferably less than 200 contiguous amino acids, more preferably less than 100 contiguous 
amino acids, more preferably less than 50 contiguous amino acids, more preferably less 
than 40 contiguous amino acids, and even more preferably less than 30 contiguous amino 
5 acids that are identical to, or homologous with, an amino acid sequence contained within 
one or more other of said segments. Such sequence overlap (also referred to elsewhere in 
the specification as "overlapping fragments" or "overlapping segments") is preferable to 
ensure potential epitopes at segment boundaries are not lost and to ensure that epitopes at 
or near segment boundaries are processed efficiently if placed beside or near amino acids 
10 that inhibit processing. Preferably, the segment size is about twice the size of the overlap. 

In a preferred embodiment, when segments have partial sequence homology 
therebetween, the homologous sequences suitably comprise conserved and/or non- 
conserved amino acid differences. Exemplary conservative substitutions are listed in the 
following table. 

15 TABLES 





' • . / ■ 


Ala 


Ser 


Arg 


Lys 


Asn 


Gin, His 


Asp 


Glu 


Cys 


Ser 


Gin 


Asn 


Glu 


Asp 


Gly 


Pro 


His 


Asn, Gin 


lie 


Leu, Val 


Leu 


He, Val 
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Lys 



Arg, Gin, Glu 



Met 



Leu, He, 



Phe 



Met, Leu, Tyr 



Ser 



Thr 



Thr 



Ser 



Trp 



Tyr 



Tyr 



Trp, Phe 



/ 



Val 



He, Leu 



Conserved or non-conserved differences may correspond to polymorphisms in 
corresponding parent polypeptides. Polymorphic polypeptides are expressed by various 
pathogenic organisms and cancers. For example, the polymorphic polypeptides may be 
5 expressed by different viral strains or clades or by cancers in different individuals. 

Sequence overlap between respective segments is preferable to minimise 
destruction of any epitope sequences that may result from any shuffling or rearrangement 
of the segments relative to their existing order in the parent polypeptide. If overlapping 
segments as described above are employed to form a synthetic polypeptide, it may not be 

10 necessary to change the order in which those segments are linked together relative to the 
order in which corresponding segments are normally present in the parent polypeptide. In 
this regard, such overlapping segments when linked together in the synthetic polypeptide 
can adopt a different structure relative to the structure of the parent polypeptide, wherein 
the different structure does not provide for one or more functions associated with the 

15 parent polypeptide. For example, in the case of four segments A-B-C-D each spanning 30 
contiguous amino acids of the parent polypeptide and having a 10-amino acid overlapping 
sequence with one or more adjacent segments, the synthetic polypeptide will have 
duplicated 10-amino acid sequences bridging segments A-B, B-C and C-D. The presence 
of these duplicated sequences may be sufficient to render a different structure and to 

20 abrogate or alter function relative to the parent polypeptide. 
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In a preferred embodiment, segment size is about 30 amino acids and sequence 
overlap at one or both ends of a respective segment is about 15 amino acids. However, it 
will be understood that other suitable segment sizes and sequence overlap sizes are 
contemplated by the present invention, which can be readily ascertained by persons of skill 
5 in the art. 

It is preferable but not necessary to utilise all the segments of the parent 
polypeptide in the construction of the synthetic polypeptide. Suitably, at least 30%, 
preferably at least 40%, more preferably at least 50%, even more preferably at least 60%, 
even more preferably at least 70%, even more preferably at least 80% and still even more 

10 preferably at least 90% of the parent polypeptide sequence is used in the construction of 
the synthetic polypeptide. However, it will be understood that the more sequence 
information from a parent polypeptide that is utilised to construct the synthetic 
polypeptide, the greater the population coverage will be of the synthetic polypeptide as an 
immunogen. Preferably, no sequence information from the parent polypeptide is excluded 

1 5 (e.g., because of an apparent lack of immunological epitopes). 

Persons of skill in the art will appreciate that when preparing a synthetic 
polypeptide against a pathogenic organism (e.g., a virus) or a cancer, it may be preferable 
to use sequence information from a plurality of different polypeptides expressed by the 
organism or the cancer. Accordingly, in a preferred embodiment, segments from a plurality 

20 of different polypeptides are linked together to form a synthetic polypeptide according to 
the invention. It is preferable in this respect to utilise as many parent polypeptides as 
possible from, or in relation to, a particular source in the construction of the synthetic 
polypeptide. The source of parent polypeptides includes, but is not limited to, a pathogenic 
organism and a cancer. Suitably, at least about 30%, preferably at least 40%, more 

25 preferably at least 50%, even more preferably at least 60%, even more preferably at least 
70%, even more preferably at least 80% and still even more preferably at least 90% of the 
parent polypeptides expressed by the source is used in the construction of the synthetic 
polypeptide. Preferably, parent polypeptides from a virus include, but are not restricted to, 
latent polypeptides, regulatory polypeptides or polypeptides expressed early during their 

30 replication cycle. Suitably, parent polypeptides from a parasite or bacterium include, but 
are not restricted to, secretory polypeptides and polypeptides expressed on the surface of 
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the parasite or bacteria. It is preferred that parent polypeptides from a cancer or tumour are 
cancer specific polypeptides. 

Suitably, hypervariable sequences within the parent polypeptide are excluded 
from the construction of the synthetic polypeptide. 

5 The synthetic polypeptides of the inventions may be prepared by any suitable 

procedure known to those of skill in the art. For example, the polypeptide may be 
synthesised using solution synthesis or solid phase synthesis as described, for example, in 
Chapter 9 of Atherton and Shephard (1989, Solid Phase Peptide Synthesis: A Practical 
Approach. IRL Press, Oxford) and in Roberge et al (1995, Science 269: 202). Syntheses 
10 may employ, for example, either /-butyloxycarbonyl (f-Boc) or 9- 
fluorenylmethyloxycarbonyl (Fmoc) chemistries (see Chapter 9.1, of Coligan et al., 
CURRENT PROTOCOLS IN PROTEIN SCIENCE, John Wiley & Sons, Inc. 1995-1997; 
Stewart and Young, 1984, Solid Phase Peptide Synthesis, 2nd ed. Pierce Chemical Co., 
Rockford, 111; and Atherton and Shephard, supra). 

15 Alternatively, the polypeptides may be prepared by a procedure including the 

steps of: 

(a) preparing a synthetic construct including a synthetic polynucleotide encoding 
a synthetic polypeptide wherein said synthetic polynucleotide is operably linked to a 
regulatory polynucleotide, wherein said synthetic polypeptide comprises a plurality of 

20 different segments of a parent polypeptide, wherein said segments are linked together 
in a different relationship relative to their linkage in the parent polypeptide; 

(b) introducing the synthetic construct into a suitable host cell; 

(c) culturing the host cell to express the synthetic polypeptide from said synthetic 
construct; and 

25 (d) isolating the synthetic polypeptide. 

The synthetic construct is preferably in the form of an expression vector. For 
example, the expression vector can be a self-replicating extra-chromosomal vector such as 
a plasmid, or a vector that integrates into a host genome. Typically, the regulatory 
polynucleotide may include, but is not limited to, promoter sequences, leader or signal 
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sequences, ribosoma] binding sites, transcriptional start and stop sequences, transJational 
start and termination sequences, and enhancer or activator sequences. Constitutive or 
inducible promoters as known in the art are contemplated by the invention. The promoters 
may be either naturally occurring promoters, or hybrid promoters that combine elements of 
5 more than one promoter. The regulatory polynucleotide will generally be appropriate for 
the host cell used for expression. Numerous types of appropriate expression vectors and 
suitable regulatory polynucleotides are known in the art for a variety of host cells. 

In a preferred embodiment, the expression vector contains a selectable marker 
gene to allow the selection of transformed host cells. Selection genes are well known in the 
10 art and will vary with the host cell used. 

The expression vector may also include a fusion partner (typically provided by the 
expression vector) so that the synthetic polypeptide of the invention is expressed as a 
fusion polypeptide with said fusion partner. The main advantage of fusion partners is that 
they assist identification and/or purification of said fusion polypeptide. In order to express 
15 said fusion polypeptide, it is necessary to ligate a polynucleotide according to the invention 
into the expression vector so that the translational reading frames of the fusion partner and 
the polynucleotide coincide. 

Well known examples of fusion partners include, but are not limited to, 
glutathione-S-transferase (GST), Fc portion of human IgG, maltose binding protein (MBP) 

20 and hexahistidine (HIS 6 ), which are particularly useful for isolation of the fusion 
polypeptide by affinity chromatography. For the purposes of fusion polypeptide 
purification by affinity chromatography, relevant matrices for affinity chromatography are 
glutathione-, amylose-, and nickel- or cobalt-conjugated resins respectively. Many such 
matrices are available in "kit" form, such as the QlAexpress™ system (Qiagen) useful with 

25 (HIS 6 ) fusion partners and the Pharmacia GST purification system. In a preferred 
embodiment, the recombinant polynucleotide is expressed in the commercial vector 
pFLAG™. 

Another fusion partner well known in the art is green fluorescent protein (GFP). 
This fusion partner serves as a fluorescent "tag" which allows the fusion polypeptide of the 
30 invention to be identified by fluorescence microscopy or by flow cytometry. The GFP tag 
is useful when assessing subcellular localisation of a fusion polypeptide of the invention, 
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or for isolating cells which express a fusion polypeptide of the invention. Flow cytometric 
methods such as fluorescence activated cell sorting (FACS) are particularly useful in this 
latter application. Preferably, the fusion partners also have protease cleavage sites, such as 
for Factor X a , Thrombin and inteins (protein introns), which allow the relevant protease to 
5 partially digest the fusion polypeptide of the invention and thereby liberate the 
recombinant polypeptide of the invention therefrom. The liberated polypeptide can then be 
isolated from the fusion partner by subsequent chromatographic separation. Fusion 
partners according to the invention also include within their scope "epitope tags", which 
are usually short peptide sequences for which a specific antibody is available. Well known 

10 examples of epitope tags for which specific monoclonal antibodies are readily available 
include c-Myc, influenza virus, haem agglutinin and FLAG tags. Alternatively, a fusion 
partner may be provided to promote other forms of immunity. For example, the fusion 
partner may be an antigen-binding molecule that is immuno-interactive with a 
conformational epitope on a target antigen or to a post-translational modification of a 

15 target antigen (e.g., an antigen-binding molecule that is immuno-interactive with a 
glycosylated target antigen). 

The step of introducing the synthetic construct into the host cell may be effected 
by any suitable method including transfection, and transformation, the choice of which will 
be dependent on the host cell employed. Such methods are well known to those of skill in 
20 the art. 

Synthetic polypeptides of the invention may be produced by culturing a host cell 
transformed with the synthetic construct. The conditions appropriate for protein expression 
will vary with the choice of expression vector and the host cell. This is easily ascertained 
by one skilled in the art through routine experimentation. 

25 Suitable host cells for expression may be prokaryotic or eukaryotic. One preferred 

host cell for expression of a polypeptide according to the invention is a bacterium. The 
bacterium used may be Escherichia coli. Alternatively, the host cell may be an insect cell 
such as, for example, SF9 cells that may be utilised with a baculovirus expression system. 

The synthetic polypeptide may be conveniently prepared by a person skilled in the 
30 art using standard protocols as for example described in Sambrook, et aL, MOLECULAR 
CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1989), in particular 
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Sections 16 and 17; Ausubel et al y CURRENT PROTOCOLS IN MOLECULAR 
BIOLOGY (John Wiley & Sons, Inc. 1994-1998), in particular Chapters 10 and 16; and 
Coligan et ai, CURRENT PROTOCOLS IN PROTEIN SCIENCE (John Wiley & Sons, 
Inc. 1995-1997), in particular Chapters 1, 5 and 6. 

5 The amino acids of the synthetic polypeptide can be any non-naturally occurring 

or any naturally occurring amino acid. Examples of unnatural amino acids and derivatives 
during peptide synthesis include but are not limited to, use of 4-amino butyric acid, 6- 
aminohexanoic acid, 4-amino-3-hydroxy-5-phenylpentanoic acid, 4-amino-3-hydroxy-6- 
rnethylheptanoic acid, t-butylglycine, norleucine, norvaline, phenyl glycine, ornithine, 
10 sarcosine, 2-thienyl alanine and/or D-isomers of amino acids. A list of unnatural amino 
acids contemplated by the present invention is shown in TABLE C. 



TABLE C 







a-aminobutyric acid 


L-N-methylalanine 


Of-amino-a-methylbutyrate 


L-N-methylarginine 


aminocyclopropane-carboxylate 


L-N-methylasparagine 


aminoisobutyric acid 


L-N-methyiaspartic acid 


aminonorbornyl-carboxylate 


L-N-methylcysteine 


cyclohexylalanine 


L-N-methylglutamine 


cyclopentylalanine 


L-N-methylglutamic acid 


L-N-methylisoleucine 


L-N-methylhistidine 


D-alanine 


L-N-methylleucine 


D-arginine 


L-N-methyllysine 


D-aspartic acid 


L-N-methylmethionine 


D-cysteine 


L-N-m ethylnorl eucine 


D-glutamate 


L-N-methylnorvaline 


D-glutamic acid 


L-N-methylornithine 



wo 
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D-histidine 


1^-lN-lIICUiyjpilCIiyialaJJIliCi 


D-isoleucine 


T -TvJ-mptfi vlnrolinp 

X*r 1 > lilt Llljr *J-'l VI II IS* 


D-leucine 


i^-iN-rneaiyi5erine 


D-lysine 


L, - jn -m em y i xnr e om n e 


D-methionine 


L-N - m ethyl tryp toph an 


D-omithine 


jl«-in -rneiiiy uyrosiiie 


D-pheny!alanine 


L-N-methyl valine 


D-proline 


L-N-methylethyl glycine 


D-serine 


L-JN -metnyj-i-DUiyjgiycine 


D-threonine 


L-norleucine 


D-tryptophan 


L-norvaline 


D-tyrosine 


Of- m e tny i - am i n o i s o p u xyraie 


D-valine 


cx-m ei n y l - j- am mo d u lyrai e 


D-a-methylalanine 


Of-meinyicycionexyidicinine 


D-a-methylarginine 


tc-meUiyiwyiL-upcuiyiaiaijiiJc 


D-a-methylasparagine 


o^me u i y l - Cx- j i dp i j i y l ai aj iii j 


D-or-methyl aspartate 


of-meijiyipeiiidiiiijjiiijc 


D-omethylcysteine 


XI ( zl_ Qminr\V^nt\/l A olAyniTlP 


D-Qf-methylglutamine 


lN-^z-dminociuyi y glycine 


J^-Of-I7iciiiyiiiioiiuii jc 


N-(3-aminopropyl) glycine 


D-a-methylisoleucine 


N-amino-of-methylbutyrate 


D-Qf-methylleucine 


a-napthylalanine 


D-a-methyllysine 


N-benzylglycine 


D-omethylmethionine 


N-(2-carbarnylediyl)glycine 


D-a-methylornithiine 


N-(carbamylmethyl)glycine 
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D-a-methylphenylalanine 


N-(2-carboxyethyl)glycine 


D-a-methylproline 


N-(carboxymethyl)glycine 


D-omethylserine 


N-cyclobutylglycine 


D-a-methylthreonine 


N-cycloheptylglycine 


D-a-methyltryptophan 


N-cyclohexy] glycine 


D-Of-methyltyrosine 


N-cyclodecylglycine 


L-Of-methylleucine 


L-omethyllysine 


L-Of-methylmethionine 


L-o-methylnorleucine 


L-omethylnorvatine 


L-a-methylornithine 


L-a-methylphenylalanine 


L-omethylproline 


L-a-rnethylserine 


L-a-methylthreonine 


L-a-methyltryptophan 


L-a-methyl tyrosine 


L-omethylvaline 


L-N-methylhomophenylalanine 


N-(N-(2,2-diphenylethyl 
carbamylmethyl)glycine 


N-(N-(3,3-diphenylpropyl 
carbamylmethyl)glycine 


1 -carboxy-1 -(2,2-diphenyl-ethyl 
amino)cyclopropane 





The invention also contemplates modifying the synthetic polypeptides of the 
invention using ordinary molecular biological techniques so as to alter their resistance to 
proteolytic degradation or to optimise solubility properties or to render them more suitable 
5 as an immunogenic agent. 

3. Preparation of synthetic polynucleotides of the invention 

The invention contemplates synthetic polynucleotides encoding the synthetic 
polypeptides as for example described in Section 2 supra. Polynucleotides encoding 
segments of a parent polypeptide can be produced by any suitable technique. For example, 
10 such polynucleotides can be synthesised de novo using readily available machinery. 
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Sequential synthesis of DNA is described, for example, in U.S. Patent No 4,293,652. 
Instead of de novo synthesis, recombinant techniques may be employed including use of 
restriction endonucleases to cleave a polynucleotide encoding at least a segment of the 
parent polypeptide and use of ligases to ligate together in frame a plurality of cleaved 
5 polynucleotides encoding different segments of the parent polypeptide. Suitable 
recombinant techniques are described for example in the relevant sections of Ausubel, et 
al (supra) and of Sambrook, et al, (supra) which are incorporated herein by reference. 
Preferably, the synthetic polynucleotide is constructed using splicing by overlapping 
extension (SOEing) as for example described by Horton et al (1990, Biotechniques 8(5): 
10 528-535; 1995, Mol Biotechnol 3(2): 93-99; and 1997, Methods Mol Biol 67: 141-149). 
However, it should be noted that the present invention is not dependent on, and not 
directed to, any one particular technique for constructing the synthetic construct. 

Various modifications to the synthetic polynucleotides may be introduced as a 
means of increasing intracellular stability and half-life. Possible modifications include but 
1 5 are not limited to the addition of flanking sequences of ribo- or deoxy- nucleotides to the 5' 
and/or 3' ends of the molecule or the use of phosphorothioate or V O-methyl rather than 
phosphodiesterase linkages within the oligodeoxyribonucleotide backbone. 

The invention therefore contemplates a method of producing a synthetic 
polynucleotide as broadly described above, comprising linking together in the same 

20 reading frame at least two nucleic acid sequences encoding different segments of a parent 
polypeptide to form a synthetic polynucleotide, which encodes a synthetic polypeptide 
according to the invention. Suitably, nucleic acid sequences encoding at least 10 segments, 
preferably at least 20 segments, more preferably at least 40 segments and more preferably 
at least 100 segments of a parent polypeptide are employed to produce the synthetic 

25 polynucleotide. 

Preferably, the method further comprises selecting segments of the parent 
polypeptide, reverse translating the selected segments and preparing nucleic acid 
sequences encoding the selected segments. It is preferred that the method further comprises 
randomly linking the nucleic acid sequences together to form the synthetic polynucleotide. 
30 The nucleic acid sequences may be oligonucleotides or polynucleotides. 
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Suitably, segments are selected on the basis of size. Additionally, or in the 
alternative, segments are selected such that they have partial sequence identity or 
homology (i.e., sequence overlap) with one or more other segments. A number of factors 
can influence segment size and sequence overlap as mentioned above. In the case of 
5 sequence overlap, large amounts of duplicated nucleic acid sequences can sometimes result 
in sections of nucleic acid being lost during nucleic acid amplification (e.g., polymerase 
chain reaction, PCR) of such sequences, recombinant plasmid propagation in a bacterial 
host or during amplification of recombinant viruses containing such sequences. 
Accordingly, in a preferred embodiment, nucleic acid sequences encoding segments having 

10 sequence identity or homology with one or more other encoded segments are not linked 
together in an arrangement in which the identical or homologous sequences are contiguous. 
Also, it is preferable that different codons are used to encode a specific amino acid in a 
duplicated region. In this context, an amino acid of a parent polypeptide sequence is 
preferably reverse translated to provide a codon which, in the context of adjacent or local 

15 sequence elements, has a lower propensity of forming an undesirable sequence (e.g., a 
duplicated sequence or a palindromic sequence) that is refractory to the execution of a task 
(e.g., cloning or sequencing). Alternatively, segments may be selected such that they 
contain a carboxyl terminal leucine residue or such that reverse translated sequences 
encoding the segments contain restriction enzyme sites for convenient splicing of the 

20 reverse translated sequences. 

The method optionally further comprises linking a spacer oligonucleotide 
encoding at least one spacer residue between segment-encoding nucleic acids. Such spacer 
residue(s) may be advantageous in ensuring that epitopes within the segments are 
processed and presented efficiently. Preferably, the spacer oligonucleotide encodes 2 to 3 
25 spacer residues. The spacer residue is suitably a neutral amino acid, which is preferably 
alanine. 

Optionally, the method further comprises linking in the same reading frame as 
other segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
30 sequence relative to other encoded segments. Suitably, the variant segment comprises 
conserved and/or non-conserved amino acid differences relative to one or more other 
encoded segments. Such differences may correspond to polymorphisms as discussed 
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above. In a preferred embodiment, degenerate bases are designed or built in to the at least 
one variant nucleic acid sequence to give rise to all desired homologous sequences. 

When a large number of polymorphisms is intended to be covered, it is preferred 
that multiple synthetic polynucleotides are constructed rather than a single synthetic 
5 polynucleotide, which encodes all variant segments. For example, if there is less than 85% 
homology between polymorphic polypeptides, then it is preferred that more than one 
synthetic polynucleotide is constructed. 

Preferably, the method further comprises optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. In this regard, it 

10 is well known that the translational efficiency of different codons varies between 
organisms and that such differences in codon usage can be utilised to enhance the level of 
protein expression in a particular organism. In this regard, reference may be made to Seed 
et al (International Application Publication No WO 96/09378) who disclose the 
replacement of existing codons in a parent polynucleotide with synonymous codons to 

15 enhance expression of viral polypeptides in mammalian host cells. Preferably, the first or 
second most frequently used codons are employed for codon optimisation. 

Preferably, gene splicing by overlap extension or "gene SOEing" (supra) is 
employed for the construction of the synthetic polynucleotide which is a PCR-based 
method of recombining DNA sequences without reliance on restriction sites and of directly 

20 generating mutated DNA fragments in vitro. By modifying the sequences incorporated into 
the 5 '-ends of the primers, any pair of PCR products can be made to share a common 
sequence at one end. Under PCR conditions, the common sequence allows strands from 
two different fragments to hybridise to one another, forming an overlap. Extension of this 
overlap by DNA polymerase yields a recombinant molecule. However, a problem with 

25 long synthetic constructs is that mutations generally incorporate into amplified products 
during synthesis. In this instance, it is preferred that resolvase treatment is employed at 
various steps of the synthesis. Resolvases are bacteriophage-encoded endonucleases which 
recognise disruptions or mispairing of double stranded DNA and are primarily used by 
bacteriophages to resolve Holliday junctions (Mizuuchi, 1982; Youil et al., 1995). For 

30 example, T7 endonuclease I can be employed in synthetic DNA constructions to recognise 
mutations and cleave corrupted dsDNA. The mutated DNA strands are then hybridised to 
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non-mutant or correct DNA sequences, which results in a mispairing of DNA bases. The 
mispaired bases are recognised by the resolvase, which then cleaves the DNA nearby 
leaving only correctly hybridised sequences intact. Preferably a thermostable resolvase 
enzyme is employed during splicing or amplification so that errors are not incorporated in 
5 downstream synthesis products. 

Synthetic polynucleotides according to the invention can be operably linked to a 
regulatory polynucleotide in the form a synthetic construct as for example described in 
Section 2 supra. Synthetic constructs of the invention have utility inter alia as nucleic acid 
vaccines. The choice of regulatory polynucleotide and synthetic construct will depend on 
10 the intended host. 

Exemplary expression vectors for expression of a synthetic polypeptide according 
to the invention include, but are not restricted to, modified Ankara Vaccinia virus as for 
example described by Allen et al. (2000, J. Immunol. 164(9): 4968-4978), fowlpox virus as 
for example described by Boyle and Coupar (1988, Virus Res. 10: 343-356) and the herpes 
15 simplex amplicons described for example by Fong et al. in U.S. Patent No. 6,051,428. 
Alternatively, Adenovirus and Epstein-Barr virus vectors, which are preferably capable of 
accepting large amounts of DNA or RNA sequence information, can be used. 

Preferred promoter sequences that can be utilised for expression of synthetic 
polypeptides include the P7.5 or PE/L promoters as for example disclosed by Kumar and 
20 Boyle. (1990, Virology 179: 151-158), CMV and RSV promoters. 

The synthetic construct optionally further includes a nucleic acid sequence 
encoding an immunostimulatory molecule. The immunostimulatory molecule may be 
fusion partner of the synthetic polypeptide. Alternatively, the immunostimulatory molecule 
may be translated separately from the synthetic polypeptide. Preferably, the 

25 immunostimulatory molecule comprises a general immunostimulatory peptide sequence. 
For example, the immunostimulatory peptide sequence may comprise a domain of an 
invasin protein (Inv) from the bacteria Yersinia spp as for example disclosed by Brett et al. 
(1993, Eur. J. Immunol. 23: 1608-1614). This immune stimulatory property results from 
the capability of this invasin domain to interact with the 01 integrin molecules present on T 

30 cells, particularly activated immune or memory T cells. A preferred embodiment of the 
invasin domain (Inv) for linkage to a synthetic polypeptide has been previously described 
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in U.S. Pat. No. 5,759,551. The said Inv domain has the sequence: Thr-AIa-Lys-Ser-Lys- 
Lys-Phe-Pro-Ser-Tyr-Thr-Ala-Thr-Tyr-Gln-Phe [SEQ ID NO; 1467] or is an immune 
stimulatory homologue thereof from the corresponding region in another Yersinia species 
invasin protein. Such homologues thus may contain substitutions, deletions or insertions of 
5 amino acid residues to accommodate strain to strain variation, provided that the 
homologues retain immune stimulatory properties. The general immunostimulatory 
sequence may optionally be linked to the synthetic polypeptide by a spacer sequence. 

In an alternate embodiment, the immunostimulatory molecule may comprise an 
immunostimulatory membrane or soluble molecule, which is suitably a T cell co- 
10 stimulatory molecule. Preferably, the T cell co-stimulatory molecule is a B7 molecule or a 
biologically active fragment thereof, or a variant or derivative of these. The B7 molecule 
includes, but is not restricted to, B7-1 and B7-2. Preferably, the B7 molecule is B7-1. 
Alternatively, the T cell co-stimulatory molecule may be an ICAM molecule such as 
ICAM-l andICAM-2. 

15 In ' another embodiment, the immunostimulatory molecule can be a cytokine, 

which includes, but is not restricted to, an interleukin, a lymphokine, tumour necrosis 
factor and an interferon. Alternatively, the immunostimulatory molecule may comprise an 
immunomodulatory oligonucleotide as for example disclosed by Krieg in U.S. Patent No. 
6,008,200. 

20 Suitably, the size of the synthetic polynucleotide does not exceed the ability of 

host cells to transcribe, translate or proteolytically process and present epitopes to the 
immune system. Practitioners in the art will also recognise that the size of the synthetic 
polynucleotide can impact on the capacity of an expression vector to express the synthetic 
polynucleotide in a host cell. In this connection, it is known that the efficacy of DNA 

25 vaccination reduces with expression vectors greater that 20-kb. In such situations it is 
preferred that a larger number of smaller synthetic constructs is utilised rather than a single 
large synthetic construct. 

4. lmmunopotentiating compositions 

The invention also contemplates a composition, comprising an 
30 immunopotentiating agent selected from the group consisting of a synthetic polypeptide as 
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described in Section 2, and a synthetic polynucleotide or a synthetic construct as described 
in Section 3, together with a pharmaceutically acceptable carrier. One or more 
immunopotentiating agents can be used as actives in the preparation of 
immunopotentiating compositions. Such preparation uses routine methods known to 
5 persons skilled in the art. Typically, such compositions are prepared as injectables, either 
as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection may also be prepared. The preparation may also be emulsified. The 
active immunogenic ingredients are often mixed with excipients that are pharmaceutically 
acceptable and compatible with the active ingredient. Suitable excipients are, for example, 

10 water, saline, dextrose, glycerol, ethanol, or the like and combinations thereof. In addition, 
if desired, the vaccine may contain minor amounts of auxiliary substances such as wetting 
or emulsifying agents, pH buffering agents, and/or adjuvants that enhance the effectiveness 
of the vaccine. Examples of adjuvants which may be effective include but are not limited 
to: aluminium hydroxide, N-acetyl-muramyl-L-threonyl-D-isoglutamine (thur-MDP), N- 

15 acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 11637, referred to as nor-MDP), N- 
acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(l '^'-dipalmitoyl-sn-glycero-S- 
hydroxyphosphoryloxy)-ethylamine (CGP 1983 A, referred to as MTP-PE), and RIBI, 
which contains three components extracted from bacteria, monophosphoryl lipid A, 
trehalose dimycolate and cell wall skeleton (MPL+TDM+CWS) in a 2% squalene/Tween 

20 80 emulsion. For example, the effectiveness of an adjuvant may be determined by 
measuring the amount of antibodies resulting from the administration of the composition, 
wherein those antibodies are directed against one or more antigens presented by the treated 
cells of the composition. 

The immunopotentiating agents may be formulated into a composition as neutral 
25 or salt forms. Pharmaceutically acceptable salts include the acid addition salts (formed 
with free amino groups of the peptide) and which are formed with inorganic acids such as, 
for example, hydrochloric or phosphoric acids, or such organic acids such as acetic, oxalic, 
tartaric, maleic, and the like. Salts formed with the free carboxyl groups may also be 
derived from inorganic basis such as, for example, sodium, potassium, ammonium, 
30 calcium, or ferric hydroxides, and such organic basis as isopropyl amine, trimethylamine, 
2-ethylamino ethanol, histidine, procaine, and the like. 
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If desired, devices or compositions containing the immunopotehtiating agents 
suitable for sustained or intermittent release could be, in effect, implanted in the body or 
topically applied thereto for the relatively slow release of such materials into the body. 

The compositions are conventionally administered parenterally, by injection, for 
5 example, either subcutaneously or intramuscularly. Additional formulations which are 
suitable for other modes of administration include suppositories and, in some cases, oral 
formulations. For suppositories, traditional binders and carriers may include, for example, 
polyalkylene glycols or triglycerides; such suppositories may be formed from mixtures 
containing the active ingredient in the range of 0.5% to 10%, preferably l%-2%. Oral 
10 formulations include such normally employed excipients as, for example, pharmaceutical 
grades of mannitol, lactose, starch, magnesium carbonate, and the like. These compositions 
take the form of solutions, suspensions, tablets, pills, capsules, sustained release 
formulations or powders and contain 10%-95% of active ingredient, preferably 25%-70%. 

Administration of the gene therapy construct to said mammal, preferably a 
15 human, may include delivery via direct oral intake, systemic injection, or delivery to 
selected tissue(s) or cells, or indirectly via delivery to cells isolated from the mammal or a 
compatible donor. An example of the latter approach would be stem cell therapy, wherein 
isolated stem cells having potential for growth and differentiation are transfected with the 
vector comprising the SoxJ8 nucleic acid. The stem cells are cultured for a period and then 
20 transferred to the mammal being treated. 

With regard to nucleic acid based compositions, all modes of delivery of such 
compositions are contemplated by the present invention. Delivery of these compositions to 
cells or tissues of an animal may be facilitated by microprojectile bombardment, liposome 
mediated transfection (e.g., lipofectin or lipofectamine), electroporation, calcium 

25 phosphate or DEAE-dextran-mediated transfection, for example. In an alternate 
embodiment, a synthetic construct may be used as a therapeutic or prophylactic 
composition in the form of a "naked DNA" composition as is known in the art. A 
discussion of suitable delivery methods may be found in Chapter 9 of CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Eds. Ausubel et al\ John Wiley & Sons 

30 Inc., 1997 Edition) or on the Internet site DNAvaccine.com. The compositions may be 
administered by intradermal (e.g., using panjet™ delivery) or intramuscular routes. 
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The step of introducing the synthetic polynucleotide into a target cell will differ 
depending on the intended use and species, and can involve one or more of non-viral and 
viral vectors, cationic liposomes, retroviruses, and adenoviruses such as, for example, 
described in Mulligan, R.C, (1993 Science 260 926-932) which is hereby incorporated by 
5 reference. Such methods can include, for example: 

A. Local application of the synthetic polynucleotide by injection (Wolff et al y 1990, 
Science 247 1465-1468, which is hereby incorporated by reference), surgical 
implantation, instillation or any other means. This method can also be used in 
combination with local application by injection, surgical implantation, instillation or 
0 any other means, of cells responsive to the protein encoded by the synthetic 
polynucleotide so as to increase the effectiveness of that treatment. This method can 
also be used in combination with local application by injection, surgical implantation, 
instillation or any other means, of another factor or factors required for the activity of 
said protein. 

5 B. Genera) systemic delivery by injection of DNA, (Calabretta et al, 1993, Cancer Treat 
Rev. 19 169-179, which is incorporated herein by reference), or RNA, alone or in 
combination with liposomes (Zhu et al., 1993, Science 261 209-212, which is 
incorporated herein by reference), viral capsids or nanoparticles (Bertling et al, 1991, 
Biotech. Appl Biochem. 13 390-405, which is incorporated herein by reference) or any 

0 other mediator of delivery. Improved targeting might be achieved by linking the 
synthetic polynucleotide to a targeting molecule (the so-called "magic bullet" approach 
employing, for example, an antibody), or by local application by injection, surgical 
implantation or any other means, of another factor or factors required for the activity of 
the protein encoding said synthetic polynucleotide , or of cells responsive to said 

5 protein. 

C. Injection or implantation or delivery by any means, of cells that have been modified ex 
vivo by transfection (for example, in the presence of calcium phosphate: Chen et al. y 
1987, Mole. Cell Biochem. 1 2745-2752, or of cationic lipids and polyamines: Rose et 
ai 9 1991, BioTech. 10 520-525, which articles are incorporated herein by reference), 
0 infection, injection, electroporation (Shigekawa et al, 1988, BioTech. 6 742-751, 
which is incorporated herein by reference) or any other way so as to increase the 



WO 01/090197 



PCT/AU01/00622 



- 108- 

expression of said synthetic polynucleotide in those cells. The modification can be 
mediated by plasmid, bacteriophage, cosmid, viral (such as adenoviral or retroviral; 
Mulligan, 1993, Science 260 926-932; Miller, 1992, Nature 357 455-460; Salmons et 
al. 9 1993, Hum. Gen. Ther. 4 129-141, which articles are incorporated herein by 
5 reference) or other vectors, or other agents of modification such as liposomes (Zhu et 
al, 1993, Science 261 209-212, which is incorporated herein by reference), viral 
capsids or nanoparticles (Bertling et aL, 1991, Biotech. Appl. Biochem. 13 390-405, 
which is incorporated herein by reference), or any other mediator of modification. The 
use of cells as a delivery vehicle for genes or gene products has been described by Barr 
10 et aL, 1991, Science 254 1507-1512 and by Dhawan et aL, 1991, Science 254 1509- 
1512, which articles are incorporated herein by reference. Treated cells can be 
delivered in combination with any nutrient, growth factor, matrix or other agent that 
will promote their survival in the treated subject. 

Also encapsulated by the present invention is a method for treatment and/or 
1 5 prophylaxis of a disease or condition, comprising administering to a patient in need of such 
treatment a therapeutically effective amount of a composition as broadly described above. 
The disease or condition may be caused by a pathogenic organism or a cancer as for 
example described above. 

In a preferred embodiment, the immunopotentiating composition of the invention 
20 is suitable for treatment of, or prophylaxis against, a cancer. Cancers which could be 
suitably treated in accordance with the practices of this invention include cancers of the 
lung, breast, ovary, cervix, colon, head and neck, pancreas, prostate, stomach, bladder, 
kidney, bone liver, oesophagus, brain, testicle, uterus, melanoma and the various leukemias 
and lymphomas. 

25 In an alternate embodiment, the immunopotentiating composition is suitable for 

treatment of, or prophylaxis against, a viral, bacterial or parasitic infection. Viral infections 
contemplated by the present invention include, but are not restricted to, infections caused 
by HIV, Hepatitis, Influenza, Japanese encephalitis virus, Epstein-Barr virus and 
respiratory syncytial virus. Bacterial infections include, but are not restricted to, those 

30 caused by Neisseria species, Meningococcal species, Haemophilus species Salmonella 
species, Streptococcal species, Legionella species and Mycobacterium species. Parasitic 
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infections encompassed by the invention include, but are not restricted to, those caused by 
Plasmodium species, Schistosoma species, Leishmania species, Trypanosoma species, 
Toxoplasma species and Giardia species. 

The above compositions or vaccines may be administered in a manner compatible 
5 with the dosage formulation, and in such amount as is therapeutically effective to alleviate 
patients from the disease or condition or as is prophylactically effective to prevent 
incidence of the disease or condition in the patient. The dose administered to a patient, in 
the context of the present invention, should be sufficient to effect a beneficial response in a 
patient over time such as a reduction or cessation of blood loss. The quantity of the 

10 composition or vaccine to be administered may depend on the subject to be treated 
inclusive of the age, sex, weight and general health condition thereof. In this regard, 
precise amounts of the composition or vaccine for administration will depend on the 
judgement of the practitioner. In determining the effective amount of the composition or 
vaccine to be administered in the treatment of a disease or condition, the physician may 

15 evaluate the progression of the disease or condition over time. In any event, those of skill 
in the art may readily determine suitable dosages of the composition or vaccine of the 
invention. 

In a preferred embodiment, DNA-based immunopotentiating agent (e.g., 100 /ig) 
is delivered intradermally into a patient at day 1 and at week 8 to prime the patient. A 
20 recombinant poxvirus (e.g., at 10 7 pfu/mL) from which substantially the same 
immunopotentiating agent can be expressed is then delivered intradermally as a booster at 
weeks 16 and 24, respectively. 

The effectiveness of the immunisation may be assessed using any suitable 
technique. For example, CTL lysis assays may be employed using stimulated splenocytes 

25 or peripheral blood mononuclear cells (PBMC) on peptide coated or recombinant virus 
infected cells using 5l Cr labelled target cells. Such assays can be performed using for 
example primate, mouse or human cells (Allen et al., 2000, J. Immunol. 164(9): 4968-4978 
also Woodberry et al, infra). Alternatively, the efficacy of the immunisation may be 
monitored using one or more techniques including, but not limited to, HLA class I 

30 Tetramer staining - of both fresh and stimulated PBMCs (see for example Allen et aL, 
supra), proliferation assays (Allen et aL, supra), Elispot™ Assays and intracellular INF- 
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gamma staining (Allen et al., supra), ELISA Assays - for linear B cell responses; and 
Western blots of cell sample expressing the synthetic polynucleotides. 

5. Computer related embodiments 

The design or construction of a synthetic polypeptide sequence or a synthetic 
5 polynucleotide sequence according to the invention is suitably facilitated with the 
assistance of a computer programmed with software, which inter alia fragments a parent 
sequence into fragments, and which links those fragments together in a different 
relationship relative to their linkage in the parent sequence. The ready use of a parent 
sequence for the construction of a desired synthetic molecule according to the invention 
10 requires that it be stored in a computer-readable format. Thus, in accordance with the 
present invention, sequence data relating to a parent molecule (e.g., a parent polypeptide) 
is stored in a machine-readable storage medium, which is capable of processing the data to 
fragment the sequence of the parent molecule into fragments and to link together the 
fragments in a different relationship relative to their linkage in the parent molecule. 

15 .Therefore, another embodiment of the present invention provides a machine- 

readable data storage medium, comprising a data storage material encoded with machine 
readable data which, when used by a machine programmed with instructions for using said 
data, fragments a parent sequence into fragments, and links those fragments together in a 
different relationship relative to their linkage in the parent sequence. In a preferred 

20 embodiment of this type, a machine-readable data storage medium is provided that is 
capable of reverse translating the sequence of a respective fragment to provide a nucleic 
acid sequence encoding the fragment and to link together in the same reading frame each 
of the nucleic acid sequences to provide a polynucleotide sequence that codes for a 
polypeptide sequence in which said fragments are linked together in a different relationship 

25 relative to their linkage in a parent polypeptide sequence. 

In another embodiment, the invention encompasses a computer for designing the 
sequence of a synthetic polypeptide and/or a synthetic polynucleotide of the invention, 
wherein the computer comprises wherein said computer comprises: (a) a machine readable 
data storage medium comprising a data storage material encoded with machine readable 
30 data, wherein said machine readable data comprises the sequence of a parent polypeptide; 
(b) a working memory for storing instructions for processing said machine-readable data; 
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(c) a central-processing unit coupled to said working memory and to said machine-readable 
data storage medium, for processing said machine-readable data into said synthetic 
polypeptide sequence and/or said synthetic polynucleotide; and (d) an output hardware 
coupled to said central processing unit, for receiving said synthetic polypeptide sequence 
5 and/or said synthetic polynucleotide. 

In yet another embodiment, the invention contemplates a computer program 
product for designing the sequence of a synthetic polynucleotide of the invention, 
comprising code that receives as input the sequence of a parent polypeptide, code that 
fragments the sequence of the parent polypeptide into fragments, code that reverse 

10 translates the sequence of a respective fragment to provide a nucleic acid sequence 
encoding the fragment, code that links together in the same reading frame each said nucleic 
acid sequence to provide a polynucleotide sequence that codes for a polypeptide sequence 
in which said fragments are linked together in a different relationship relative to their 
linkage in the parent polypeptide sequence, and a computer readable medium that stores 

1 5 the codes. 

A version of these embodiments is presented in Figure 23, which shows a system 
10 including a computer 11 comprising a central processing unit ("CPU") 20, a working 
memory 22 which may be, e.g., RAM (random- access memory) or "core" memory, mass 
storage memory 24 (such as one or more disk drives or CD-ROM drives), one or more 
20 cathode-ray tube ("CRT") display terminals 26, one or more keyboards 28, one or more 
input lines 30, and one or more output lines 40, all of which are interconnected by a 
conventional bidirectional system bus 50. 

Input hardware 36, coupled to computer 11 by input lines 30, may be 
implemented in a variety of ways. For example, machine-readable data of this invention 
25 may be inputted via the use of a modem or modems 32 connected by a telephone line or 
dedicated data line 34. Alternatively or additionally, the input hardware 36 may comprise 
CD. Alternatively, ROM drives or disk drives 24 in conjunction with display terminal 26, 
keyboard 28 may also be used as an input device. 

Output hardware 46, coupled to computer 1 1 by output lines 40, may similarly be 
30 implemented by conventional devices. By way of example, output hardware 46 may 
include CRT display terminal 26 for displaying a synthetic polynucleotide sequence or a 
synthetic polypeptide sequence as described herein. Output hardware might also include a 
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printer 42, so that hard copy output may be produced, or a disk drive 24, to store system 
output for later use. 

In operation, CPU 20 coordinates the use of the various input and output devices 
36,46 coordinates data accesses from mass storage 24 and accesses to and from working 
5 memory 22, and determines the sequence of data processing steps. A number of programs 
may be used to process the machine readable data of this invention. Exemplary programs 
may use for example the steps outlined in the flow diagram illustrated in Figure 24. 
Broadly, these steps include (1) inputting at least one parent polypeptide sequence; (2) 
optionally adding to alanine spacers at the ends of each polypeptide sequence; (3) 

10 fragmenting the polypeptide sequences into fragments {e.g., 30 amino acids long), which 
are preferably overlapping {e.g., by 15 amino acids); (4) reverse translating the fragment to 
provide a nucleic acid sequence for each fragment and preferably using for the reverse 
translation first and second most translationally efficient codons for a cell type, wherein the 
codons are preferably alternated out of frame with each other in the overlaps of 

15 consecutive fragments; (5) randomly rearranging the fragments; (6) checking whether 
rearranged fragments recreate at least a portion of a parent polypeptide sequence; (7) 
repeating randomly rearranging the fragments when rearranged fragments recreate said at 
least a portion; or otherwise (8) linking the rearranged fragments together to produce a 
synthetic polypeptide sequence and/or a synthetic polynucleotide sequence; and (9) 

20 outputting said synthetic polypeptide sequence and/or a synthetic polynucleotide sequence. 
An example of an algorithm which uses inter alia the aforementioned steps is shown in 
Figure 25. By way of example, this algorithm has been used for the design of synthetic 
polynucleotides and synthetic polypeptides according to the present invention for Hepatitis 
C la and for melanoma, as illustrated in Figures 26 and 27. 

25 Figure 28 shows a cross section of a magnetic data storage medium 100 which can 

be encoded with machine readable data, or set of instructions, for designing a synthetic 
molecule of the invention, which can be carried out by a system such as system 30 of 
Figure 23. Medium 100 can be a conventional floppy diskette or hard disk, having a 
suitable substrate 101, which may be conventional, and a suitable coating 102, which may 

30 be conventional, on one or both sides, containing magnetic domains (not visible) whose 
polarity or orientation can be altered magnetically. Medium 100 may also have an opening 
(not shown) for receiving the spindle of a disk drive or other data storage device 24. The 
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magnetic domains of coating 102 of medium 100 are polarised or oriented so as to encode 
in manner which may be conventional, machine readable data such as that described 
herein, for execution by a system such as system 10 of Figure 23. 

Figure 29 shows a cross section of an optically readable data storage medium 1 10 
5 which also can be encoded with such a machine-readable data, or set of instructions, for 
designing a synthetic molecule of the invention, which can be carried out by a system such 
as system 10 of Figure 23. Medium 110 can be a conventional compact disk read only 
memory (CD-ROM) or a rewritable medium such as a magneto-optical disk, which is 
optically readable and magneto-optically writable. Medium 100 preferably has a suitable 
10 substrate 111, which may be conventional, and a suitable coating 112, which may be 
conventional, usually of one side of substrate 111. 

In the case of CD-ROM, as is well known, coating 112 is reflective and is 
impressed with a plurality of pits 113 to encode the machine-readable data. The 
arrangement of pits is read by reflecting laser light off the surface of coating 112. A 
1 5 protective coating 1 14, which preferably is substantially transparent, is provided on top of 
coating 1 12. 

In the case of a magneto-optical disk, as is well known, coating 112 has no pits 
113, but has a plurality of magnetic domains whose polarity or orientation can be changed 
magnetically when heated above a certain temperature, as by a laser (not shown). The 
20 orientation of the domains can be read by measuring the polarisation of laser light reflected 
from coating 1 12. The arrangement of the domains encodes the data as described above. 

In order that the invention may be readily understood and put into practical effect, 
particular preferred non-limiting embodiments will now be described as follows. 
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EXAMPLES 
EXAMPLE 1 

Preparation of an HIV Savine 

Experimental Protocol 
5 PI asm ids 

The plasmid pDNAVacc is ampicillin resistant and contains an expression 
cassette comprising a CMV promoter and enhancer, a synthetic intron, a multiple cloning 
site (MCS) and a SV40poly A signal sequence (Thomson et aL, 1998). The plasmid 
pTK7.5 and contains a selection cassette, a pox virus 7.5 early/late promoter and a MCS 
1 0 flanked on either side by Vaccinia virus TK gene sequences. 

Recombinant Vaccinia Viruses 

Recombinant Vaccinia viruses expressing the gag, env (IIB) and pol (LAI) genes 
of HIV- 1 were used as previously described and denoted VV-GAG, VV-POL, VV-ENV 
(Woodberry et al, 1999; Kent et ai, 1998). 

1 5 Marker Rescue Recombination 

Recombinant Vaccinia viruses containing Savine constructs were generated by 
marker rescue recombination, using protocols described previously (Boyle et al., 1985). 
Plaque purified viruses were tested for the TK phenotype and for the appropriate genome 
arrangement by Southern blot and PCR. 

20 Oligonucleotides 

Oligonucleotides 50 nmol scale and desalted were purchased from Life 
Technologies. Short oligonucleotides were resuspended in 100 \iL of water, their 
concentration determined, then diluted to 20 for use in PCR or sequencing reactions. 
Long oligonucleotides for splicing reactions were denatured for 5 minutes at 94°C in 

25 20 ^L of formamide loading buffer then 0.5 \xL gel purified on a 6% polyacrylamide gel. 
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Gel slices containing full-length oligonucleotides were visualised with ethidium bromide, 
excised, placed in Eppendorf™ tubes, combined with 200 uX of water before being 
crushed using the plunger of a 1 mL syringe. Before being used in splicing reactions the 
crushed gel was resuspended in an appropriate volume of buffer and 1-2 u.L of the 
5 resuspendate used directly in the^splicing reactions. 

Sequencing 

Sequencing was performed using Dye terminator sequencing reactions and 
analyzed by the Biomedical Resource Facility at the John Curtin School of Medical 
Research using an ABI automated sequencer. 

10 Restimulation of Lymphocytes from HIV Infected Patients 

Two pools of recombinant Vaccinia viruses containing W-AC1 + VV-BC1 (Pool 
1) or VV-AC2 + VV-BC2 + VV-CC2 (Pool 2) were used to restimulate lymphocytes from 
the blood samples of HIV-infected patients. Briefly CTL lines were generated from HIV- 
infected donor PBMC. A fifth of the total PBMC were infected with either Pool 1 or Pool 2 

15 Vaccinia viruses then added back to the original cell suspension. The infected cell 
suspension was then cultured with IL-7 for 1 week. 

CTL Assays 

Restimulated PBMCs were used as effectors in a standard 5, Cr-release CTL assay. 
Targets were autologous EBV-transformed lymphoblastoid cell lines (LCLs) infected with 
20 the following viruses : Pool 1, Pool 2,W-GAG, W-POL or W-ENV. Assay controls 
included uninfected targets, targets infected with VV-lacZ (virus control) and K562 cells. 

Results 

HIV Savine Design 

A main goal of the Savine strategy is to include as much protein sequence 
25 information from a pathogen or cancer as possible in such a way that potential T cell 
epitopes remain intact and so that the vaccine or therapy is extremely safe. An HIV Savine 
is described herein not only to compare this strategy to other strategies but also, to produce 
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an HIV vaccine that would provide the maximum possible population coverage as well as 
catering for the major HIV clades. 

A number of design criteria was first determined to exploit the many advantages 
of using a synthetic approach. One advantage is that it is possible to use consensus protein 
5 sequences to design these vaccines. Using consensus sequences for a highly variable virus 
like HIV should provide better vaccine coverage because individual viral isolate sequences 
may have lost epitopes which induce CTL against the majority of other viral isolates. Thus, 
using the consensus sequences of each HIV clade rather than individual isolate sequences 
should provide better vaccine coverage. Taking this one step further, a consensus sequence 

10 that covers all HIV clades should theoretically provide better coverage than using just the 
consensus sequences for individual clades. Before designing such a sequence however, it 
was decided that a more appropriate and focussed HIV vaccine might be constructed if the 
various clades were first ranked according to their relative importance. To establish such a 
ranking the following issues were considered, current prevalence of each clade, the rate at 

15 which each clade is increasing and the capacity of various regions of the world to cope 
with the HIV pandemic (Figures 1 and 2). These criteria produced the following ranking, 
Clade E > clade A > clade C > clade B > clade D > other clades. Clades E and A were 
considered to almost equal since they are very similar except in their envelope protein 
sequences, which differ considerably. 

20 Another advantage of synthesising a designed sequence is that it is possible to 

incorporate degenerate sequences into their design. In the case of HIV, this means that 
more than one amino acid can be included at various positions to improve the ability of the 
vaccine to cater for the various HIV clades and isolates. Coverage is improved because 
mutations in different HIV clades and also in individual isolate sequences, while mostly 

25 destroying specific T cell epitopes, can result in the formation of new potentially useful 
epitopes nearby (Goulder et aL, 1997). Incorporating degenerate amino acid sequences, 
however, also means that more than one construct must be made and mixed together. The 
number of constructs required depends on the frequency with which mutations are 
incorporated into the design. While this approach requires the construction of additional 

30 constructs, these constructs can be prepared from the same set of degenerate long 
oligonucleotides, significantly reducing the cost of providing such considerable interclade 
coverage. 
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A set of degeneracy rules was developed for the incorporation of amino? acid 
mutations into the design which meant that a maximum of eight constructs would be 
required so that theoretically all combinations were present, as follows: 1) Two amino 
acids at three positions (or less) within any group of nine amino acids (/.e., present in a 
5 CTL epitope); 2) Three amino acids at one position and two at another (or not) within any 
group of nine amino acids; 3) Four amino acids at one position and two at another (or not) 
within any group of nine amino acids. The reason why these rules were applied to nine 
amino acids (the average CTL epitope size) and not to larger stretches of amino acid 
sequence to cater for class II restricted epitopes, is because class II-restricted epitopes 
10 generally have a core sequence of nine amino acids in the middle which bind specifically 
to class II MHC molecules with the extra flanking sequences stabilising binding, by 
associating with either side of class II MHC antigens in a largely sequence independent 
manner (Brown et al. 9 1993). 

Using the HIV clade ranking described above, the amino acid degeneracy rules 
15 and in some situations the similarity between amino acids, a degenerate consensus protein 
sequence was designed for each HIV protein using the consensus protein sequences for 
each HIV clade compiled by the Los Alamos HIV sequence database (Figures 3-11) (HIV 
Molecular Immunology Database, 1997). It is important to note that in some situations the 
order with which each of the above design criteria was applied was altered. Each time this 
20 was done the primary goal however was to increase the ability of the Savine to cater for 
interclade differences. Two isolate sequences, GenBank accession U51189 and U46016, 
for clade E and clade C, respectively, were used when a consensus sequence for some HIV 
proteins from these two clades was unavailable (Gao et al., 1996; Salminen et aL, 1996). 
The design of a consensus sequence for the hypervariable regions of the HIV envelope 
25 protein and in some cases between these regions (hypervariable regions 1-2 and 3-5) was 
difficult and so these regions were excluded from the vaccine design. 

Once a degenerate consensus sequence was designed for each HIV protein, an 
approach was then determined for incorporating all the protein sequences safely into the 
vaccine. One convenient approach to ensure that a vaccine will be safe is to systematically 
30 fragment and randomly rearrange the protein sequences together thus abrogating or 
otherwise altering their structure and function. The protein sequences still have to be 
immunologically functional however, meaning that the process used to fragment the 
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sequences should not destroy potential epitopes. To decide on the best approach for 
systematically fragmenting protein sequences, the main criteria used was the size of T 
epitopes and their processing requirements. Class I-restricted T cell epitopes are 8-10 
amino acids long and generally require 2-3 natural flanking amino acids to ensure their 
5 efficient processing and presentation if placed next to unnatural flanking residues (Del Val 
et al. y 1991; Thomson et al. y 1995). Class II-restricted T cell epitopes range between 12-25 
amino acids long and do appear to require natural flanking residues for processing 
however, it is difficult to rule out a role for natural flanking residues in all cases due to the 
complexity of their processing pathways (Thomson et ai, 1998). Also class II-restricted 

10 epitopes despite being larger than CTL epitopes generally have a core sequence of 9-10 
amino acids, which binds to MHC molecules in a sequence specific fashion. Thus, based 
on current knowledge, it was decided that an advantageous approach was to overlap the 
fragments by at least 15 amino acids to ensure that potential epitopes which might lie 
across fragment boundaries are not lost and to ensure that CTL epitopes near fragment 

15 boundaries, that are placed beside or near inhibitory amino acids in adjacent fragments, are 
processed efficiently. In deciding the optimal fragment size, the main criteria used were 
that size had to be small enough to cause the maximum disruption to the structure and 
function of proteins but large enough to cover the sequence information as efficiently as 
possible without any further unnecessary duplication. Based on these criteria the fragments 

20 would be twice the overlap size, in this case 30 amino acids long. 

The designed degenerate protein sequences were then separated into fragments 30 
amino acid long and overlapping by fifteen amino acids. Two alanine amino acids were 
also added to the start and end of the first and last fragment for each protein or envelop 
protein segment to ensure these fragments were not placed directly adjacent to amino acids 

25 capable of blocking epitope processing (Del Val et al., 1991). The next step was to reverse 
translate each protein sequence back into DNA. Duplicating DNA sequences was avoided 
when constructing DNA sequences encoding a tandem repeat of identical or homologous 
amino acid sequences to maximise expression of the Savine. In this regard, the first and 
second most commonly used mammalian codons (shown in Figure 12) were assigned to 

30 amino acids in these repeat regions, wherein a first codon was used to encode an amino 
acid in one of the repeated sequences and wherein a second but synonymous codon was 
used for the other repeated sequence (e.g., see the gag HIV protein in Figure 13). To cater 
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for the designed amino acid mutations more than one base was assigned to some positions 
using the IUPAC DNA codes without exceeding more than three base variations (eight 
possible combinations) in any group of 27 bases (Figure 12). Where a particular 
combination of amino acids could not be incorporated, because too many degenerate bases 
5 would be required, some or all of the amino acid degeneracy was removed according to the 
protein consensus design rules outlined above. Also the degenerate codons were checked 
to determine if they could encode a stop codon, if stop codons could not be avoided then 
the amino acid degeneracy was also simplified again according to the protein consensus 
design rules outlined above. 

10 The designed DNA segments were then scrambled randomly and joined to create 

twenty-two subcassettes approximately 840 bp in size. Extra DNA sequences incorporating 
sites for one of the cohesive restriction enzymes Xbal, Spel, AvrU or Nhel and 3 additional 
base pairs (to cater for premature Taq polymerase termination) were then added to each 
end of each subcassette (Figure 14). Some of these extra DNA sequences also contained, 

15 the cohesive restriction sites for Sail or Xhol, Kozak signal sequences and start or stop 
codons to enable the subcassettes to be joined and expressed either as three large cassettes 
or one full length protein (Figures 14 and 15). 

In designing the HIV Savine one issue that required investigation was whether 
such a large DNA molecule would be fully expressed and whether epitopes encoded near 

20 the end of the molecule would be efficiently presented to the immune system. The 
inventors also wished to show that mixing two or more degenerate Savine constructs 
together could induce T cell responses that recognise mutated sequences. To examine both 
issues DNA coding for a degenerate murine influenza nucleoprotein CTL epitope, NP365- 
373, which differs by two amino acids at positions 71 and 72 in influenza strain A/PR/8/34 

25 compared to the A/NT/60/68strain and restricted by H2-Db, was inserted before the last 
stop codon at the end of the HIV Savine design (Figure 15). An important and unusual 
characteristic of both of these naturally occurring NP365-373 sequences, which enabled 
the present inventors to examine the effectiveness of incorporating mutated sequences, is 
that they generate CTL responses which do not cross react with the alternate sequence 

30 (Townsend et. ai, 1986). This is an unusual characteristic because epitopes not destroyed 
by mutation usually induce CTL responses that cross-react. 
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Up lo ten long oligonucleotides up to 100 bases long and two short amplification 
oligonucleotides were synthesised to enable construction of each subcassette (Life 
Technologies). In designing each oligonucleotide the 3' end and in most cases also the 5' 
end had to be either a V or a 'g' to ensure efficient extension during PCR splicing. The 
5 overlap region for each long oligonucleotide was designed to be at least 16 bp with 
approximately 50% G/C content. Also oligonucleotide overlaps were not placed where 
degenerate DNA bases coded for degenerate amino acids to avoid splicing difficulties 
later. Where this was too difficult some degenerate bases were removed according to the 
protein consensus design rules outlined above and indicated in Figure 12. Figure 16 shows 
10 an example of the oligonucleotides design for each subcassette. 

Construction of the HIV Savine 

Five of each group of ten designed oligonucleotides were spliced together using 
stepwise asymmetric PCR (Sandhu et al. 3 1992) and Splicing by Overlap Extension 
(SOEing) (Figure 17a). Each subcassette was then PCR amplified, cloned into 

15 pBluescript™ II KS" using BamHVEcoRl and 16 individual clones sequenced. Mutations, 
deletions and insertions were present in the large majority of the clones for each 
subcassette, despite acrylamide gel purification of the long oligonucleotides. In order to 
construct a functional Savine with minimal mutations, two clones for each subcassette with 
no insertions or deletions and hence a complete open reading frame and with minimal 

20 numbers of non-designed mutations, were selected from the sixteen available. The 
subcassettes were then excised from their plasmids and joined by stepwise PCR-amplified 
ligation using the polymerase blend Elongase™ (Life Technology), T4 DNA ligase and the 
cohesive restriction enzymes Xbal/Spel/AvrlVNhel, to generate two copies of cassettes A, 
B and C as outlined in Figure 14 and shown in Figure 17b. Predicted sequences for these 

25 cassettes are shown in Figure 30. Each cassette was then reamplified by PCR with 
Elongase™, cloned into pBluescript™ II KS" and 3 of the resulting plasmid clones 
sequenced using 12 of the 36 sequencing primers designed to cover the full length 
construct. Clones with minimal or no further mutations were selected for transfer into 
plasmids for DNA vaccination or used to make recombinant poxviruses. A summary of the 

30 number of designed and non-designed mutations in each Savine construct is presented in 
Table 1. 
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Summary of mutations 



* 




i • . 










Cassette A 


1896 


249 


124 


107 


5 (AC1), 8 (AC2) 


Cassette B 


1184 


260 


130 


124 


1 1 (BC1), 4 (BC2) 


Cassette C 


1969 


276 


138 


121 


10(CC1), 14(CC2) 


Full length 


5742 


785 


392 


352 


26(FL1), 26(FL2) 



Summary of the mutations present in the two full-length clones constructed as determined by 
5 sequencing. Includes the number of mutations designed, expected and actually present in the 2 clones and the 
number of non-designed mutations in each cassette and full-length clone. 



HIV Savine DNA vaccines and Recombinant Vaccinia viruses 

To test the immunological effectiveness of the Hrv Savine constructs the cassette 
sequences were transferred into DNA vaccine and poxvirus vectors. These vectors when 
10 used either separately in immunological assays described below or together in a 'prime- 
boost' protocol which has been shown previously to generate strong T cell responses in 
vivo (Kent et al. y 1997). 

DNA Vaccination plasmids were constructed by excising the cassettes from the 
selected plasmid clones with XbaVXhol (cassette A) or XbaVSali (cassettes B and C) and 

1 5 ligating them into pDNAVacc cut with XbaVXhol to create pDVACl , pDVAC2, pDVBCl , 
pDVBC2, pDVCCl, pDVCC2, respectively (Figure 18a). These plasmids were then 
further modified by cloning into their Xbal site a DNA fragment excised using XbaVAvrH 
from pTUMERA2 and encoding a synthetic endoplasmic reticulum (ER) signal sequence 
from the Adenovirus El A protein (Persson et aL, 1980) (Figure 18a). ER signal sequences 

20 have been shown previously to enhance the presentation of both CTL and T helper 
epitopes in vivo (Ishioka, G.Y., 1999; Thomson et al y 1998). The plasmids pDVERACl, 
pDVERBCl, pDVERCCl andpDVERAC2, pDVERBC2, pDVERCC2 were then mixed 
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together to create, plasmid pool 1 and pool 2 respectively. Each plasmid pool collectively 
encodes one copy of the designed full-length HIV Savine. 

Plasmids to generate recombinant Vaccinia viruses which express HIV Savine 
sequences were constructed by excising the various HIV Savine cassettes from the selected 
5 plasmid clones using BamWJXhol (cassette A) or BamHUSall (cassettes B and C) and 
cloned into the marker rescue plasmid, pTK7.5, cleaved with BamHUSall. These pTK7.5- 
derived plasmids were then used to generate recombinant Vaccinia viruses by marker 
rescue recombination using established protocols (Boyle et ah, 1985) to generate VV-AC1, 
W-AC2, VV-BC1, W-BC2, VV-CC1 and VV-CC2 (Figure 18b). 

Two further DNA vaccine plasmids were constructed each encoding a version of 
the full length HIV Savine (Figure 18c). Briefly, the two versions of cassette B were 
excised with Xhol and cloned into the corresponding selected plasmid clones containing 
cassette A sequences that were cut with XhoUSah to generate pBSABl and pBSAB2 
respectively. The joined A/B cassettes in pBSABl and pBSAB2 were excised with 
XbaVXhol and cloned' into pDVCCl and pDVCC2, respectively, and cleaved with 
XbaVXhol to generate pDVFLl and pD VFL2. These were then further modified to contain 
an ER signal sequence using the same cloning strategy as outlined in figure 1 8a. 

Restimulation of HIV specific lymphocytes from HIV infected patients 

The present inventors examined the capacity of the HIV Savine to restimulate 
20 HIV-specific polyclonal CTL responses from HIV-infected patients. PBMCs from three 
different patients were restimulated in vitro with two HIV Savine Vaccinia virus pools 
(Pool 1 included VV-AC1 andVV-BCl; Pool 2 included VV-AC2, VV-BC2 and VV-CC2) 
then used in CTL lysis assays against LCLs infected either with one of the Savine Vaccinia 
virus pools or Vaccinia viruses which express gag, env or pol. Figure 19 clearly shows, 
25 that in all three assays, both HIV Savine viral pools restimulated HIV-specific CTL 
responses which could recognise targets expressing whole natural HIV antigens and not 
targets which were uninfected or infected with the control Vaccinia virus. Furthermore, in 
all three cases, both pools restimulated responses that recognised all three natural HIV 
antigens. This result suggests that the combined Savine constructs will provide broader 
30 immunological coverage than single antigen based vaccine approaches. The level of lysis 
in each case of targets infected with Savine viral pools was significantly higher than the 
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lysis recorded for any other infected target. This probably reflects the combined CTL 
responses to gag, pol, and env plus other HIV antigens not analysed here but whose 
sequences are also incorporated into the Savine constructs. 

CTL recognition of each HIV antigen is largely controlled by each patient's HLA 
5 background hence the pattern of CTL lysis for whole HIV antigens is different in each 
patient. Interestingly, this CTL lysis pattern did not change when the second Savine 
Vaccinia virus pool was used for CTL restimulation. In these assays, therefore, the 
inventors were unable to demonstrate clear differences between pools 1 and 2, despite pool 
1 lacking a Vaccinia virus expressing cassette CC1 and despite the many amino acid 
10 differences between the A and B cassettes in each pool (see table 1). 

From the foregoing, the present inventors have developed a novel 
vaccine/therapeutic strategy. In one embodiment, pathogen or cancer protein sequences are 
systemically fragmented, reverse translated back into DNA, rearranged randomly then 
joined back together. The designed synthetic DNA sequence is then constructed using long 

15 oligonucleotides and can be transferred into a range of delivery vectors. The vaccine 
vectors used here were DNA vaccine plasmids and recombinant poxvirus vectors which 
have been previously shown to elicit strong T cell responses when used together in a 
'prime-boost* protocol (Kent et aL y 1997). An important advantage of scrambled antigen 
vaccines or Ravines' is that the amount of starting sequence information for the design can 

20 be easily expanded to include the majority of the protein sequences from a pathogen or for 
cancer, thereby providing the maximum possible vaccine or therapy coverage for a given 
population. 

An embodiment of the systematic fragmentation approach described herein was 
based on the size and processing requirements for T cell epitopes and was designed to 
25 cause maximal disruption to the structure and function of protein sequences. This 
fragmentation approach ensures that the maximum possible range of T cell epitopes will be 
present from any incorporated protein sequence without the protein being functional and 
able to compromise vaccine safety 

Another important advantage of Savines is that consensus protein sequences can 
30 be used for their design. This feature is only applicable when the design needs to cater for 
pathogen or cancer antigens whose sequence varies considerably. HIV is a highly 
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mutagenic virus, hence this feature was utilised extensively to design a vaccine which has 
the potential to cover not only field isolates of HIV but also the major HIV clades involved 
in the current HIV pandemic. To construct the HIV Savine, one set of long 
oligonucleotides was synthesised, which included degenerate bases in such a way that 8 
5 constructs are theoretically required for the vaccine to contain all combinations in any 
stretch of 9 amino acids. The inventors believe that this approach can be improved for the 
following reasons: 1) While degenerate bases should be theoretically equally represented, 
in practice some degenerate bases were biased towards one base or the other, leading to a 
lower than expected frequency of the designed mutations in the two full length HIV 

10 Savines which were constructed (see Table 1). 2) Only sequence combinations actually 
present in the HIV clade consensus sequences are required to get full clade coverage, 
hence the number of full length constructs needed could be reduced. To reduce the number 
of constructs however, separate sets of long oligonucleotides would have to be synthesised, 
significantly increasing the cost, time and effort required to generate a vaccine capable of 

15 such considerable vaccine coverage. 

A significant problem during the construction of the HIV Savine synthetic DNA 
sequence was the incorporation of non-designed mutations. The most serious types of 
mutations were insertions, deletions or those giving rise to stop codons, all of which 
change the frame of the synthesised sequences and/or caused premature truncation of the 

20 Savine proteins. These types of mutation were removed during construction of the HIV 
Savines by sequencing multiple clones after subcassette and cassette construction and 
selecting functional clones. The major source of these non-designed mutations was in the 
long oligonucleotides used for Savine synthesis, despite their gel purification. This 
problem could be reduced by making the initial subcassettes smaller thereby reducing the 

25 possibility of corrupted oligonucleotides being incorporated into each subcassette clone. 
The second major cause of non-designed mutations was the large number of PCR cycles 
required for the PCR and ligation-mediated joining of the subcassettes. Including extra 
sequencing and clone selection steps during the subcassette joining process should help to 
reduce the frequency of non-designed mutations in future constructs. Finally, another 

30 method that could help reduce the frequency of such mutations at all stages is to use 
resolvase treatment. Resolvases are bacteriophage-encoded endonucleases which recognise 
disruptions to double stranded DNA and are primarily used by bacteriophages to resolve 
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Holliday junctions (Mizuuchi, 1982; Youil et al. y 1995). T7 endonuclease I has already 
been used by the present inventors in synthetic DNA constructions to recognise mutations 
and cleave corrupted dsDNA to allow gel purification of correct sequences. Cleavage of 
corrupted sequences occurs because after a simple denaturing and hybridisation step 
5 mutated DNA hybridises to correct DNA sequences and results in a mispairing of DNA 
bases which is able to be recognised by the resolvase. This method resulted in a 50% 
reduction in the frequency of errors. Further optimisation of this method and the use of a 
thermostable version of this type of enzyme could further reduce the frequency of errors 
during long Savine construction. 

10 Two pools of Vaccinia viruses expressing Savine cassettes were both shown to 

restimulate HIV-specific responses from three different patients infected with B clade HIV 
viruses. These results provide a clear indication that the HIV Savine should provide broad 
coverage of the population because each patient had a different HLA pattern yet both pools 
were able to restimulate HIV-specific CTL responses in all three patients against all three 

15 natural HIV proteins tested. Also, both pools were shown to restimulate virtually identical 
CTL patterns in all three patients. This result was unexpected because some responses 
should have been lost or gained due to the amino acid differences between the two pools 
and because Pool 1 is only capable of expressing 2/3 of the full length HIV Savine. There 
are two suggested reasons why the pattern of CTL lysis was not altered between the two 

20 viral pools. Firstly, the sequences in the Savine constructs are nearly all duplicated because 
the fragment sequences overlap. Hence the loss of a third of the Savine may not have 
excluded sufficient T cell epitopes for differences to be detected in only three patient 
samples against only three HIV proteins. Secondly, while mutations often destroy T cell 
epitopes, if they remain functional, then the CTL they generate frequently can recognise 

25 alternate epitope sequences. Taken together this finding indirectly suggests that combining 
only two Savine constructs may provide robust multiclade coverage. Further experiments 
are being carried out to directly examine the capacity of the HIV Savine to stimulate CTL 
generated by different strains of HIV virus. The capacity of the two HIV-l Savine 
Vaccinia vector pools to stimulate CD4+ T cell HIV-l specific responses from infected 

30 patients was also tested (Figure 20). Both patients showed significant proliferation of 
CD4+ T cells although both pools did not show consistent patterns suggesting that the two 
pools may provide wider vaccine coverage than using either pool independently. 
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The present inventors have generated a novel vaccine strategy, which has been 
used to generate what the inventors believe to be the most effective HIV candidate vaccine 
to date. The inventors have used this vaccine to immunise naive mice. Figure 21 shows 
conclusively that the HF/-1 Savine described above can generate a Gag and Nef CTL 
5 response in naive mice. It should be noted, however, that the Nef CTL epitope appeared to 
exist only in Pool 1 since it was not restimulated by Pool 2. This is further proof of the 
utility of combining H1V-1 Savine Pool 1 and Pool 2 components together to provide 
broader vaccine coverage. 

The HIV-1 Savine Vaccinia vectors have also been used to restimulate in vivo 
10 HIV-1 responses in pre-immune M. nemestrina monkeys. These experiments (Figure 22) 
showed, by INF-y ELISPOT and CD69 expression on both CD4 and CD8 T cells, that the 
ability of the HIV-1 SAVINE to restimulate HIV-1 specific responses in vivo is equivalent 
or perhaps better than another HIV-1 candidate vaccine. 

This is a generic strategy able to be applied to many other human infections or 
15 cancers where T-cell responses are considered to be important for protection or recovery. 
With this in mind the inventors have begun constructing Savines for melanoma, cervical 
cancer and Hepatitis C. In the case of melanoma, the majority of the currently identified 
melanoma antigens have been divided into two groups, one containing antigens associated 
with melanoma and one containing differentiation antigens from melanocytes, which are 
20 often upregulated in melanomas. Two Savine constructs are presently being constructed to 
cater for these two groups. The reason for making the distinction is that treatment of 
melanoma might first proceed using the Savine that incorporates fragments of melanoma 
specific antigens only. If this Savine fails to control some metastases then the less specific 
Savine containing the melanocyte-specific antigens can then be used. It is important to 
25 point out that other cancers also express many of the antigens specific to melanomas e.g., 
testicular and breast cancers. Hence the melanoma specific Savine may have therapeutic 
benefits for other cancers. 

A small Savine is also being constructed for cervical cancer. This Savine will 
contain two antigens, E6 and E7, from two strains of human papilloma virus (HPV), HPV- 
30 16 and HPV-18, directly linked with causing the majority of cervical cancers worldwide. 
There is a large number of sequence differences in these two antigens between the two 
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strains which would normally require two Savines to be constructed. However since this 
Savine is small, the antigen fragments from both strains are being scrambled together. 
While it is normally better for the Savine approach to include all or a majority of the 
antigens from a virus, in this case only E6 and E7 are expressed during viral latency or in 
5 cervical carcinomas. Hence in the interests of simplicity, the rest of the HPV genome will 
not be included although all HPV antigens would be desirable in a Savine against genital 
warts. 

Two Savines have also been constructed for two strains of hepatitis C, a major 
cause of liver disease in the world. Hepatitis C is similar to HIV in the requirements for a 

10 vaccine or therapeutic. However, the major hepatitis C strains share significantly lower 
homology, 69-79%, with one another than do the various HIV clades. To cater for this the 
inventors have decided to construct two separate constructs to cater for the two major 
strains present in Australia, types 1 aand 3a, which together cause approximately 80-95% of 
hepatitis C infections in this country. Both constructs will be approximately the same size 

1 5 as the HIV Savine but will be blended together into a single vaccine or therapy. 

Overall it is believed that the Savine vaccine strategy is a generic technology 
likely to be applied to a wide range of human diseases. It is also believed that because it is 
not necessary to characterise each antigen, this technology will be actively applied to 
animal vaccines as well where research into vaccines or therapies is often inhibited by the 
20 lack of specific reagents, modest research budgets and poor returns on animal vaccines. 

EXAMPLE 2 
Hepatitis C Savine 

Synthetic immunomodulatory molecules have also been designed for treating 
Hepatitis C. In one example, the algorithm of Figure 25 was applied to a consensus 

25 polyprotein sequence of Hepatitis C la to facilitate its segmentation into overlapping 
segments (30 aa segments overlapping by 15 aa), the rearrangement of these segments into 
a scrambled order and the output of Savine nucleic acid and amino acid sequences, as 
shown in Figure 26. Exemplary DNA cassettes (A, B and C) are also shown in Figure 26, 
which contain suitable restriction enzyme sites at their ends to facilitate their joining into a 

30 single expressible open reading frame. 
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EXAMPLE 3 
Melanoma Savine 

The algorithm of Figure 25 was also applied to melanocyte differentiation 
antigens (gplOO, MART, TRP-1, Tyros, Trp-2, MC1R, MUC1F and MUC1R) and to 
5 melanoma specific antigens (BAGE, GAGE-1, gpl00In4, MAGE-1, MAGE-3, PRAME, 
TRP2IN2, NYNSOla, NYNSOlb and LAGE1), as shown in Figure 27, to provide separate 
Savine nucleic acid and amino acid sequences for treating or preventing melanoma. 

EXAMPLE 4 

Resolvase Repair Experiment 
10 A resolvase can be used advantageously to repair errors in polynucleotides. The 

following procedure outlines resolvase repair of a synthetic 340 bp fragment in which 

DNA errors were common. 

t 

Method 

The 340 bp fragment was PCR amplified and gel purified on a 4% agarose gel. 
15 After spin purifying, lOul of the eluate corresponding to approximately 100 ng was 
subjected to the resolvase repair treatment. The rest of the DNA sample was stored for later 
cloning as the untreated control. 

2 /xL of lOxPCR buffer, 2 /xL of 20 mM MgCl 2 and 6 fiL of MilliQ™ water 
(MQW) and Taq DNA polymerase were added to the 10 /xL DNA sample. The mixture 

20 was subjected to the following thermal profile; 95°C for 5min, 65°C for 30min, cooled and 
held at 37°C. Five /xL of 10xT7 endonuclease I buffer, 8 /xL of 1/50 /xL of T7endoI enzyme 
stock and 17 x*L of MQW were added, mixed and incubated for 30 min. Loading buffer 
was added to the sample and the sample was electrophoresed on a 4% agarose gel. A faint 
band corresponding to the full length fragment was excised and subjected to 15 further 

25 cycles of PCR. The amplified fragment was agarose gel purified and, along with the 
untreated DNA sample, cloned into pBluescript. Eleven plasmid clones for each DNA 
sample were sequenced and the number and type of errors compared (see table) 
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Buffers were as follows: 

lQx T7endonuclease buffer 

2.5ml 1M TRIS pH7.8, 0.5ml 1M MgCl 2 , 25 /*L 1 M DTT, 50 /iL lOmg/mL BSA, 
2 mL MQW made up to a total of 5 mL. 

5 T7 endonuclease I stock 

Concentrated sample of enzyme prepared by, and obtained from, Jeff Babon (St 
Vincent's Hospital) was diluted 1/50 using the following dilution buffer: 50 /xL 1 M TRIS 
pH7.8, 0.1 /xL 1M EDTA pH8, 5 100 mM glutathione, 50 fiL lOmg/mL BSA, 2.3 mL 
MQW, 2.5 mL glycerol made up to a total of 5 mL. 

10 Results 

The results are summarised in Tables 2 and 3. 



TABLE 2 





PafiipBS*! •- : ; ' 




A/T to G/C = 6 


A/T to G/C = 1 


G/CtoA/T=12 


G/C to A/T = 7 


A/T to deletion — 1 


A/T to deletion = 1 


G/C to deletion = 6 


G/C to deletion = 3 


TABLE 3 


r i 






6/1 1 contained deletions 


3/1 1 contained deletions 


9/1 1 contained mutations 


7/1 1 contained mutations 
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2/1 1 correct 


3/1 1 correct 



Discussion/Conclusion 

While overall the number of correct clones obtained was not significantly 
different, there was a significant difference in the level of errors. This reduction in errors 
becomes more significant as greater numbers of long oligonucleotides are joined into the 
5 one construct i.e., increasing the difference between untreated versus treated samples in the 
chance of obtaining a correct clone. It is believed that combining another resolvase such as 
T4 endonuclease VII may further enhance repair or increase the bias against errors. 

Importantly, this experiment was not optimised e.g., by using proofreading PCR 
enzymes or optimised conditions. Finally if the repair reaction is carried out during normal 

10 PCR, for example, by^ including a thermostable resolvase, it is believed that amplification 
of already damaged long oligonucleotides, and the normal accumulation of PCR induced 
errors, even using error reading polymerases during PCR, could be reduced significantly. 
The repair of damaged long oligonucleotides is particularly important for synthesis of long 
DNA fragment such as in Savines because, while the rate of long oligonucleotide damage 

15 is typically <5%, after joining 10 oligonucleotides, the error rate approaches 50%. This is 
true even using the best proofreading PCR enzymes because these enzymes do not verify 
the sequence integrity using correct oligonucleotide templates that exist as a significant 
majority (95%) in a joining reaction. 

20 The disclosure of every patent, patent application, and publication cited herein is 

incorporated herein by reference in its entirety. 

The citation of any reference herein should not be construed as an admission that 
such reference is available as "Prior Art" to the instant application 

Throughout the specification the aim has been to describe the preferred 
25 embodiments of the invention without limiting the invention to any one embodiment or 
specific collection of features. Those of skill in the art will therefore appreciate that, in 
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light of the instant disclosure, various modifications and changes can be made in the 
particular embodiments exemplified without departing from the scope of the present 
invention. All such modifications and changes are intended to be included within the 
scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1. A synthetic polypeptide comprising a plurality of different segments of at least one 
parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide. 

2. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
single parent polypeptide. 

3. The synthetic polypeptide of claim 1, consisting essentially of different segments of a 
plurality of different parent polypeptides. 

4. The synthetic polypeptide of claim 1, wherein the segments in said synthetic 
polypeptide are linked sequentially in a different order or arrangement relative to their 
linkage in said at least one parent polypeptide. 

5. The synthetic polypeptide of claim 4, wherein the segments in said synthetic 
polypeptide are randomly rearranged relative to their order or arrangement in said at least 
one parent polypeptide. 

6. The synthetic polypeptide of claim 1, wherein the size of an individual segment is at 
least 4 amino acids. 

7. The synthetic polypeptide of claim 6, wherein the size of an individual segment is from 
about 20 to about 60 amino acids. 

8. The synthetic polypeptide of claim 7, wherein the size of an individual segment is 
about 30 amino acids. 

9. The synthetic polypeptide of claim 7, comprising at least 30% of the parent polypeptide 
sequence. 

10. The synthetic polypeptide of claim 1, wherein at least one of said segments comprises 
partial sequence identity or homology to one or more other said segments. 

11. The synthetic polypeptide of claim 10, wherein the sequence identity or homology is 
contained at one or both ends of an individual segment. 
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12. The synthetic polypeptide of claim 11, wherein one or both ends of said segment 
comprises at least 4 contiguous amino acids that are identical to, or homologous with, an 
amino acid sequence contained within one or more other of said segments. 

13. The synthetic polypeptide of claim 10, wherein the size of an individual segment is 
about twice the size of the sequence that is identical or homologous to the or each other 
said segment. 

14. The synthetic polypeptide of claim 13, wherein the size of an individual segment is 
about 30 amino acids and the size of the sequence that is identical or homologous to the or 
each other said segment is about 15 amino acids. 

15. The synthetic polypeptide of claim 1, wherein an optional spacer is interposed between 
some or all of the segments. 

16. The synthetic polypeptide of claim 15, wherein the spacer alters proteolytic processing 
and/or presentation of adjacent segment(s). 

17. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
neutral ammo acid. 

18. The synthetic polypeptide of claim 16, wherein the spacer comprises at least one 
alanine residue. 

19. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is 
associated with a disease or condition. 

20. The synthetic polypeptide of claim 1 , wherein the at least one parent polypeptide is 
selected from a polypeptide of a pathogenic organism, a cancer-associated polypeptide, an 
autoimmune disease-associated polypeptide, an allergy-associated polypeptide or a variant 
or derivative of these. 

21. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
polypeptide of a virus. 

22. The synthetic polypeptide of claim 21, wherein the virus is selected from a Human 
Immunodeficiency Virus (HIV) or a Hepatitis virus. 

23. The synthetic polypeptide of claim 22, wherein the virus is a Human 
Immunodeficiency Virus (HIV) and the at least one parent polypeptide is selected from 
env, gag, pol, vif, vpr, tat, rev, vpu and nef, or a combination thereof. 
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24. The synthetic polypeptide of claim 1, wherein the at least one parent polypeptide is a 
cancer-associated polypeptide. 

25. The synthetic polypeptide of claim 24, wherein the cancer is melanoma. 

26. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen. 

27. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanocyte differentiation antigen selected from gplOO, MART, TRP-1, Tyros, TRP2, 
MC1R, MUC1F, MUC1R or a combination thereof. 

28. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen. 

29. The synthetic polypeptide of claim 25, wherein the at least one parent polypeptide is a 
melanoma-specific antigen selected from BAGE, GAGE-1, gp!001n4, MAGE-1, MAGE- 
3, PRAME, TRP2IN2, NYNSOla, NYNSOlb, LAGE1 or a combination thereof. 

30. A synthetic polynucleotide encoding a synthetic polypeptide comprising a plurality of 
different segments of at least one parent polypeptide, wherein the segments are linked 
together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide. 

31. A method for producing the synthetic polynucleotide encoding a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said method comprising: 

- linking together in the same reading frame a plurality of nucleic acid sequences 
encoding different segments of the at least one parent polypeptide to form a synthetic 
polynucleotide whose sequence encodes said segments linked together in a different 
relationship relative to their linkage in the at least one parent polypeptide. 

32. The method of claim 31, further comprising fragmenting the sequence of a respective 
parent polypeptide into fragments and linking said fragments together in a different 
relationship relative to their linkage in a respective parent polypeptide sequence. 
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33. The method of claim 32, wherein the fragments are randomly linked together. 

34. The method of claim 31, further comprising reverse translating the sequence of a 
respective parent polypeptide or a segment thereof to provide a nucleic acid sequence 
encoding said parent polypeptide or said segment. 

35. The method of claim 34, wherein an amino acid of a respective parent polypeptide 
sequence is reverse translated to provide a codon, which has higher translational efficiency 
than other synonymous codons in a cell of interest. 

36. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence that is refractory to 
the execution of a task. 

37. The method of claim 35, wherein an amino acid of said parent polypeptide sequence is 
reverse translated to provide a codon which, in the context of adjacent or local sequence 
elements, has a lower propensity of forming an undesirable sequence selected from a 
palindromic sequence or a duplicated sequence, which is refractory to the execution of a 
task selected from cloning or sequencing. 

38. The method of claim 31, further comprising linking a spacer oligonucleotide encoding 
at least one spacer residue between segment-encoding nucleic acids. 

39. The method of claim 38, wherein spacer oligonucleotide encodes 2 to 3 spacer 
residues. 

40. The method of claim 38 or claim 39, wherein the spacer residue is a neutral amino acid. 

41 . The method of claim 38 or claim 39, wherein the spacer residue is alanine. 

42. The method of claim 31, further comprising linking in the same reading frame as other 
segment-containing nucleic acid sequences at least one variant nucleic acid sequence 
which encodes a variant segment having a homologous but not identical amino acid 
sequence relative to other encoded segments. 
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43. The method of claim 42, wherein the variant segment comprises conserved and/pr non- 
conserved amino acid differences relative to one or more other encoded segments. 

44. The method of claim 43, wherein the differences correspond to sequence 
polymorphisms. 

45. The method of claim 44, wherein degenerate bases are designed or built in to the at 
least one variant nucleic acid sequence to give rise to all desired homologous sequences. 

46. The method of claim 31, further comprising optimising the codon composition of the 
synthetic polynucleotide such that it is translated efficiently by a host cell. 

47. A synthetic construct comprising a synthetic polynucleotide encoding a synthetic 
polypeptide comprising a plurality of different segments of at least one parent polypeptide, 
wherein the segments are linked together in a different relationship relative to their linkage 
in the at least one parent polypeptide to impede, abrogate or otherwise alter at least one 
function associated with the parent polypeptide, wherein said synthetic polynucleotide is 
operably linked to a regulatory polynucleotide. 

48. The synthetic construct of claim 47, further including a nucleic acid sequence encoding 
an immunostimulatory molecule. 

49. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises a domain of an invasin protein (Inv). 

50. The synthetic construct of claim 48, wherein the immunostimulatory molecule 
comprises the sequence set forth in SEQ ID NO: 1467 or an immune stimulatory 
homologue thereof. 

51. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule. 

52. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a T 
cell co-stimulatory molecule selected from a B7 molecule or an ICAM molecule. 

53. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a B7 
molecule or a biologically active fragment thereof, or a variant or derivative of these. 
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54. The synthetic construct of claim 48, wherein the immunostimulatory molecule is a 
cytokine selected from an interleukin, a lymphokine, tumour necrosis factor or an 
interferon. 

55. The synthetic construct of claim 48, wherein the immunostimulatory molecule is an 



56. An immunopotentiating composition, comprising an immunopotentiating agent 
selected from the synthetic polypeptide of claim 1, the synthetic polynucleotide of claim 30 
or the synthetic construct of claim 47, together with a pharmaceutical^ acceptable carrier. 

57. The composition of claim 56, further comprising an adjuvant. 

58. A method for modulating an immune response, which response is preferably directed 
against a pathogen or a cancer, comprising administering to a patient in need of such 
treatment an effective amount of an immunopotentiating agent selected from the synthetic 
polypeptide of claim 1, the synthetic polynucleotide of claim 30, the synthetic construct of 
claim 47, or the composition of claim 56. 

59. A method for treatment and/or prophylaxis of a disease or condition, comprising 
administering to a patient in need of such treatment an effective amount of an 
immunopotentiating agent selected from selected from the synthetic polypeptide of claim 
1, the synthetic polynucleotide of claim 30, the synthetic construct of claim 47, or the 
composition of claim 56. 

60. A computer program product for designing the sequence of a synthetic polypeptide 
comprising a plurality of different segments of at least one parent polypeptide, wherein the 
segments are linked together in a different relationship relative to their linkage in the at 
least one parent polypeptide to impede, abrogate or otherwise alter at least one function 
associated with the parent polypeptide, said program product comprising: 

- code that receives as input the sequence of said at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 



- code that links together said fragments in a different relationship relative to their 
linkage in said parent polypeptide sequence; and 



immunomodulatory oligonucleotide. 



fragments; 
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- a computer readable medium that stores the codes. 

61. The computer program product of claim 60, further comprising code that randomly 
rearranges said fragments. 

62. The computer program product of claim 60, further comprising code that links the 
sequence of a spacer residue to the sequence of said at least one parent polypeptide or to 
said fragments. 

63. A computer program product for designing the sequence of a synthetic polynucleotide 
encoding a synthetic polypeptide comprising a plurality of different segments of at least 
one parent polypeptide, wherein the segments are linked together in a different relationship 
relative to their linkage in the at least one parent polypeptide to impede, abrogate or 
otherwise alter at least one function associated with the parent polypeptide, comprising: 

- code that receives as input the sequence of at least one parent polypeptide; 

- code that fragments the sequence of a respective parent polypeptide into 



- code that reverse translates the sequence of a respective fragment to provide a 
nucleic acid sequence encoding said fragment; 

- code that links together in the same reading frame each said nucleic acid 
sequence to provide a polynucleotide sequence that codes for a polypeptide sequence in 
which said fragments are linked together in a different relationship relative to their 
linkage in the at least one parent polypeptide sequence; and 

- a computer readable medium that stores the codes. 

64. The computer program product of claim 63, further comprising code that randomly 
rearranges said nucleic acid sequences. 

65. The computer program product of claim 64, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon, 
which has higher translational efficiency than other synonymous codons in a cell of 
interest. 

66. The computer program product of claim 63, further comprising code that reverse 
translates an amino acid of a respective parent polypeptide sequence to provide a codon 



fragments; 
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which, in the context of adjacent or local sequence elements, has a lower propensity of 
forming an undesirable sequence that is refractory to the execution of a task. 

67. The computer program product of claim 63, further comprising code that links a spacer 
oligonucleotide to one or more of said nucleic acid sequences. 

68. A computer for designing the sequence of a synthetic polypeptide comprising a 
plurality of different segments of at least one parent polypeptide, wherein the segments are 
linked together in a different relationship relative to their linkage in the at least one parent 
polypeptide to impede, abrogate or otherwise alter at least one function associated with the 
parent polypeptide, wherein said computer comprises: 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polypeptide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polypeptide sequence. 

69. The computer of claim 68, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments and 
linking together said fragments in a different relationship relative to their linkage in the 
sequence of said parent polypeptide. 

70. The computer of claim 68, wherein the processing of said machine readable data 
comprises randomly rearranging said fragments. 

71. The computer of claim 68, wherein the processing of said machine readable data 
comprises linking the sequence of a spacer residue to the sequence of said at least one 
parent polypeptide or to said fragments. 
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72. A computer for designing the sequence of a synthetic polynucleotide encoding a 
synthetic polypeptide comprising a plurality of different segments of at least one parent 
polypeptide, wherein the segments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide to impede, abrogate or otherwise alter at 
least one function associated with the parent polypeptide, wherein said computer 
comprises: 

(a) a machine-readable data storage medium comprising a data storage materia! 
encoded with machine-readable data, wherein said machine-readable data comprise the 
sequence of at least one parent polypeptide; 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium, for processing said machine readable data to provide said 
synthetic polynucleotide sequence; and 

(d) an output hardware coupled to said central processing unit, for receiving said 
synthetic polynucleotide sequence. 

73. The computer of claim 72, wherein the processing of said machine readable data 
comprises fragmenting the sequence of a respective parent polypeptide into fragments, 
reverse translating the sequence of a respective fragment to provide a nucleic acid 
sequence encoding said fragment and linking together in the same reading frame each said 
nucleic acid sequence to provide a polynucleotide sequence that codes for a polypeptide 
sequence in which said fragments are linked together in a different relationship relative to 
their linkage in the at least one parent polypeptide sequence. 

74. The computer of claim 72, wherein the processing of said machine readable data 
comprises randomly rearranging said nucleic acid sequences. 

75. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon, which has higher translational efficiency than other synonymous codons 
in a cell of interest. 
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76. The computer of claim 72, wherein the processing of said machine readable data 
comprises reverse translating an amino acid of a respective parent polypeptide sequence to 
provide a codon which, in the context of adjacent or local sequence elements, has a lower 
propensity of forming an undesirable sequence that is refractory to the execution of a task. 

77. The computer of claim 72, wherein the processing of said machine readable data 
comprises linking a spacer oligonucleotide to one or more of said nucleic acid sequences. 
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CONSENSUS A- CPZ FROM LOS ALAMOS HIV SEQUENCE DATABASE 
ISOLATE-E SEQ FROM ISOLATE 93TH25 3 THAILAND 
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Full length ~1 7000 bp 



Xbal'BamHI Sall/Xhol destroyed Xhol intact 

Cassette A -5600 bp Cassette B ~5600bp 



XbarBamHI Sail 

BglllEcoRIXhol 



=H fcr 



Xhol . xXhol 
XbarBamHI £glllEc6RISall ; 



BglllEcoRISall* 

Cassette C ~5800bp 

•=5= ' 1 

Xho1 BglllEcoRISall* 
XbarBamHI 



i 

Full length construction after cloning the cassettes into pBS 
Sites marked with a D * n are in the pBS MCS 



Cassette Extras (Can be removed from cassette ends) 

A (37bp) * BamHI/Kozak Start Sail Stop Bglll EcoRI 

5' gc ggatccacc atg ....gtcgac tga agatct gaattc gc 3' 

B(43bp) BamHI/Kozak Start Xhol Xhol Stop Bglll EcoRI 

5' gc ggatccacc atg ctcgag ctcgag tga agatgt gaattc gc3' 

C(37bp) BamHI/Kozak Start Xhol Stop Bglll EcoRI 

5* gc ggatccacc atg ctcgag... tga agatct gaattc gc 3* 
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Cassette Construction 



Full Length 5687bp 



A1-A4 3330bp 



4 I- 



A5-A8 2500bp 



Subcassettes 

, A1/A2 1670bp 



Xbal.. Spel 



A3/A4 1670bp 



A5/A6 1670bp 



A7 840bp , 



Nhel Spel I Xbal Spel . | Xbal Nhel 

Spel/Xbal Avrll/Nhel Avrll/Nhel 



Subcassette Extras (Can be removed 



SCI 


(A 28bp, 


, B/C 34bp) 




As for 


5' 


of Cassettes 


SC2 


<28bp) 




BamHI 


Xbal 




5 1 


gc 


ggatcc 


tctaga 


SC3 


<28bp) 




BamHI 


Spel 






gc 


ggatcc 


actagt 


SC4 


(28bp) 




BamHI 


Nhel 




5 ■ 


gc 


ggatcc 


gctagc 


SC5 


(28bp) 




BamHI 


Spel 




5 ' 


gc 


ggatcc 


actagt 


see 


(28bp) 




BamHI 


Nhel 




5 ' 


gc 


ggatcc 


gctagc 


For 


Cassettes A and B 


only 


SC7 


(37bp) 




BamHI 


Nhel 




5 ' 


gc 


ggatcc 


gctagc . 


For 


Cassette 


i C 


only 




SC7 


(28bp) 




BamHI 


Nhel 




5 • 


gc 


ggatcc 


gctagc . 


SC8 


(31bp) 




BamHI 


Xbal 




5 * 


gc 


ggatcc 


tctaga . 



from cassette ends) 

Spel EcoRI 
actagt gaattc gc 

Nhel EcoRI 
gctagc gaattc gc 

Avrll EcoRI 
. . 1 cctagg gaattc gc 

Xbal EcoRI 
tctaga gaattc gc 

Avrll EcoRI 
ccatgg gaattc gc 

Xbal EcoRI 
tctaga gaattc gc 



As for 3 ' of Cassettes A/B 



Spel EcoRI 
.actagt gaattc gc 3 



As for 3' of Cassette C 



FIGURE 14 (Cont) 
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Kozac 

BamHI I Start 30 40 „„ lftW1d M aq\ 70 8 ° 



GG<JgGATc[cA< - C(a Tv| AGGC CT JCAM^ VACGTCA rWCCGTG AATGCACACACGGA r h ^CCCGTCGTGTCCAC 



:oa 



J ian J ? env 185-214 (149) 



CCdCCTAdGTGCjTACjTGTCCGGGAACGTK 

* 1 MTGPCXNVSXVQCTHGIXPVVST> 

90 100 1x0 120 130 gag 76-105 (6) 160 



CCAACTCCTCCTGAATGGCTCCCTdARAAGCCTCTWCAATAC^ 
CCTTGACGAGGACTTACCGAGGGAcjTYTTCGCAGAWCTTAT^^ 

qlllngsl'x S LXNTXATLWCVHQ RI X> 



170 180 190 200 210 



220 p 0 | 31-60(36) 



TCARGGACACAAAGGAAGCCCTCGACAAAATCGA/sCTCGCCGATGGCGGAGGCGCTGAWAGGCAJlGGCACCTtrCAGCTCC 
AGTYCCTGTGTTTCCTTCGGGAGCTGTTTTAGCTTjGAGCC^ 

V X DTKEAL DK I e'lGDGGGAXRQ GT S S S> 

250 260 270 280 290 300 310 320 

YTCARCTTTCCACAAATCACACTGTGGCAAAGGCCTCTGGTCACu 
RACTYGAAAGGTtTITCACrrGTCACACCCTTTCCCGAGAC^ 
X XFP QI TLWQ R P LVT'EPFRXXN PX MV I> 

pol 316-345 (55) 350 360 370 380 390 400 

TTACCAGTACATGGACGATCTGTATGTGGGAAGCGATCTGGAAATCGGACAGCATrTTTACCACACCCGATAAGAAACACC 
AATGGTCATGTACCTGCTAGACATACACCCTTCGCTAGACCTTTAGCCTGTCGT/IAAATGGTGTGGGCTAT^ 

Y Q Y M DDLYVG S DLEI GQH*FTTFDK K H> 

410 pol 361-390 (58) 440 450 460 470 480 

* • • » » * 

aaaaggaaccaccattcctctggatgggatacgaactgcatcccgatacgt^ 
ttttccttggtggtaaggagacctacccttatgctt^ 

qkeppflwmcy elhpdrwtvqp*xxfpq> 

490 500 pol 46-75 (37) 530 540 550 560 

» * * * *• * 

ATTACCCTCTCGCAGCGTCCCCTCGTGACARTCAAAATCGCCGGACAGCTC 

TAATGGGAGACCGTCGCAGGGGAGCACTGTYAGTTTTAGCCGCC^TCGAGTVmrTCCGAGACGAGCTGTGTCCOAGGRT 
I TLWQRPLVTX K I GGQLXEALL DT G ' S X> 

570 sbo 590 tat 46-75(121) 620 630 640 

TGGCAGAAAGAAACGTAGGCAACGTAGASGCGCTCCTCAGAGCAGMRAGGATCACCAATACCCTATCYCTGAGCAA^ 
ACCGTCrrre r r P GCATCCGTTGCATCTSCGC 

G RKKRRQRRXA PQSXXDHQYP I XEQP> 



6S0 660 670 680 



pol 1-30 (34) 7 *° 72 o 



TCYCCiTTCTTTACCGAAJtf.CCTCGCTTTCCMGCAAGGTRAAGCC^ 
AGRGdAAGAAATCCCTTTTGGACCGAAAGGKCt^rTCCAYTTCGGTC-rorc 

lx'ffrenla PXQGXAREFXSEQTXANS> 



730 740 7so rev 1 06-1 22 (1 31 ) spacers aoo I 

YCCRCCTCCAGGAAC^GCCCCCAAATCTCCGGCGAAAGCTCCGYCRTTCTGGGAYCTGCX:ACC?^AAAAC GCCGCTp^^^ . . 
RGGYGGAGGTCCTTQTCGGGGGTTTAGAGGCCGCTTTCGAGGCRGYAAGACCCTRGACCGTGGTTTTT< CGCCGT yEttg&d J° in 
X XSRK'SPQI SG ESSX XLGXGTKN I A A | ~t R> A2 

810 820 Bio gag 91-120 (7) 860 87 0 8 8 0 ^ 

m A G A ATCG A WGTG AR AGATACCAAAGA GGCTCTGG ATAAGATTG AGG AGGWG C AAAASAAAAGCMAGC AAAAGACXCAAC 
^TCTT AGCT WC ACT Y TCTATGGTTTCTCC GAG A C CT ATTCTAACTC CTCCWCGTTTTSTTTTCG KTC GTTTTC TGTGTTG 
' R 1XVXDTKEAL DKI EEXQXKSXQKTQ> 

890 900 910 920 DOl 601-630 (74) 950 960 

AGGCTC^CCGCnAAAGCCGCATACGTCACrGATAGGGGAAGGCAA^ 
TCCGACGGCGAp^CGGCCTATGCAC/TGCCTATCCCCTTCCGTTT^ 

OA aa'kagyutdr GRQKXXSLTXTTN q k> 

FIGURE IS 
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970 980 990 iooo ioio env 46-75 (1 40) 1040 

* i * * * * 

ACCGAACTGCAWGCCATTCAWWMGCCfWTACCACACTGTTTTGCGCCAGCGATGCCAAAGCCYATGASACAGAGGTCCA 
TGGCTTGACGTWCGGTAAGTM^KCGGYKATGGTGTGACAAAACGCGGTCGCTACGGTTTCGGRTACTSTGTCTCCAGGT 
T E L X A I X X AXTTLFCASDAKAXXT EV H> 

1050 1060 1070 1080 1090 pol 76-1 05 (39) 1120 

* 

CAATGTGTGGGCCACACACGCTTGCGTCCCdGCTGAC^ 

GTTAGACACCCGGTGTGTGCGAACGXTAGGGGCGACTGCTATGTCACGACCTCCTSTASTl*GGAGGGGCCT"rYTACCTTCG 
N VWAT H ACV P ' A DD T V L E X X N L P G X W K> 

1130 1140 1150 1160 1170 1180 1190 1200 

* 

CTAAGATGATTGGCGGAATCGOCGGATTCATTAAGGTGAGfl 
GATTCTACTAACCGCCTTAGCCGCCTAAGTAATTCCACTC^ 

PKM IGGIGGFI KVR'X IG PENPYNTPX F> 

pol 196-225 (47) i"J 12S ? 126 ? 12? ! 128 °. 

gctatcaagaaaaaggactccaccaaatggagaaagctcgtcxsat^ 

CGATAGTTCTTTTTCCTGAGGTGGTTTACCTCTTTCGAGC 
A I KKKDSTKWRKLV D F R 1 X RI I X I L Y Q S=> 



1320 1330 1340 1350 1360 



1290 rev 16-45 (125) 

caatccctatcctagctccgaaggcvtccaggcaarccagaargaataggagaaggagatgJggaggcga^ 
gttaggcatag^tcgaggcttccgwggtccgttyg^tcttycttatcctcttcct 

N PY PS S EGXRQXR XNRR .RRW'GGEXXR^ 

1370 1380 env 525-554 (171) 1410 142 ? l43 -J 144 ° 



atacgtcc6tcagactgctcarcccattctyagcc 

TATCCAGGCACTCTGACCAGTYGCCTAAGARTCGGGAGCGGACCCTGCTAGACTCTTYGGAGACGGAGAAdCTKTTGGAG 
DR.S VR L V X G F X A LA WDD L RXLC LF X N L> 

1050 i 4 6o 1470 en v 31 -60 (1 39) 150 ? 1S1 ? 152 ? 

TGGGTCACCGTCTACTATGGCGTCCCCGTCTGGAGAGASGCTRNCACAACCCTCTT'CTGTGCCTCCGACGCTAAGGCTYA 
ACrCAGTGGCAGATGATACCGCAGGG^AGACCTCTCTSCGAYKGTGTTGGGAGAAGACACGGAGGCTGCGATTCCGART 
WVTVYYGVPVWR XAXTTLFCASDAKAX: 



X> 

1590 1600 



spacers 1550 iseo rev 1-30 (124) 

c(gctgcc|atcgctggcagaagcggcrrca 

C CGACGG rACCGACCGTCTTCGCCGYYGTGTCTGCTTCTCGAGGACTYCCGAYAGTCTrAGTAATTSTAAGACATA<3TCA 
I A a1 - * ^ » «; n x T D E E L L X A X R I I X I L Y Q> 



MA-GRSGXTDEE L L X A 
1610 1620 1630 1640 1650 Vlf 16-45 (1 01) l68 ° 



t 

* A2 

CCAACCCTTACCCTTCCplll^ : « : „ 

GGTTGGG AATGGGAAGC 0%$^ PACTYTTAGTCTTGGACCTTSTCGGACCAGTTCGTAGTGTAC RTGTAGAGGTTCTTT JOin 
S N P Y P S AS M XIRTWX SLVXHHMXISKK> A3 

1690 1700 1710 1720 1730 1740 1750 1760 I 

T 

GCCAAWGGCTGGTTCTATAGGCATCACTWTGAS 3AGTCCGAGSTCGTGARTCAGATTATCGAAVAGCTCATCAAAAAGGA 
CGGTTWCCGACCAAGATATCCGTAGTGAWACTS rTCAGGCTCSAGCAC^YAGTCTAATAGCTTBTCGAGTAGTTTTTCCT 
AXGWFYRHHXXESE XVXQ I I E X LI KK E> 

pol 661-690 (78) »«o iboo isio 1820 1830 i8*o 

aarggtctacctakcatgcgtaccagcccacaagcgaatcggJca^ 

TTYCCAGATGGATMGTACCCATGGTCCGGTCTTC^ 

X VY LXWVPAHKG I G O T K E L 0 X Q 1 XKI> 

iBSo pol 916-945 (95) 1880 189 ? 190 ? 191 ? 192 ? 

AAAACTTTAG^GTCTACTATAGGGATAGCAGAGACCCTMIXTIXSGAAGGGAC^ 

1*7^TGAAATCCCAGATGATATCCCTATCGTCTCTGGGAKAGACCTTCCCTGG0TTTTCGRAACTCCTTTAGACCYTGTTA 

qn'frvyyrdsrdpxwkgp'ksxee3wxn> 
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SUBSTITUTE SHEET (RULE 26) 



WO 01/090197 PCT/AU01/00622 



65/216 

1930 env 405-434 (163) i960 1970 i960 1990 2000 

ATGACATGGATKSAGTGGGAGAGAGAGATTAGCAATTACACAARCCWAATCTATRAGATTCTOARACCCGAACCCACAGC 
TACTGTACCTAMSTCACCCTCTCTCTCTAATCGrrAATGTGTT^GG 
MTWXXWERE I SNYTXXZYXI L 1 X P E P T A> 

2010 2020 gag 451-480 (31) 2050 2060 2070 2080 

ccctcccgctgagartttcrgattcggtgaggaaactacaccctccckaaagcaagagcm;^ 
gggagggcgactctyaaagyctaagccactcctttgatgtgcxsag 

p paexfxfg eett psxkqexk d k e q y> 

2090 2100 2110 DOl 106-135 (41) 2140 2 i 50 2 i 60 

atcagattmttattgagattt-gcgccaagaaagctattggtacagtgctcgtgggacctacccctgtc 
t ac tc t aaj< aat a actctaaacgccgttctttcg a taacc a tgtcacgagc^ 

dqixieicgkk aigtvlvgptpvni ic> 
2170 2180 2190 2200 vpr 46-75 (1 1 5) 2230 2240 

agJatttacgaaacctatggcgatacctgggagggcgtcgaggctctgatcagaaycctc 
tctjtaaatgctttggataccgctatg^ 
' r'i yetygdtwegvealirxlqqlxfxh> 

2250 2260 2270 2280 2290 tat 31 -61 (1 20) 2320 

ft * * * * * * * 

aaagtcttagcct^aawagtaacggttsacacaaaagagtggtttccagagccct 

frig'cxhcqxcfltkglgisx grk k r> 

I 2330 23<o spacers 2370 2380 tat 1-30 (118) 



/ 



>gagctccccaag« 
;ctcgaggcgt-Jc< 

X A P O I i 



QRRXAPOlAA 



racagagaaggsgagctcccca^gctgcc atggaccccgtggaccccaasctggagccttggaawcaccctggctcccag 

YTGTCTCTTCC SCTCG AGGCCTTjCG ACGCfTACCTCCGGC A C CTCGCCTT S G ACCTCCCAACCTTWGTCCCACCGAGGGTC 

M DPV D PX LE PWX H PC SQ> 



2410 2420 2430 2440 2450 2460 2470 2480 



CCTAMGACAGCCTGTWMCAAATGCTATTGCAAAAAGTG< ^g^^C GAAGAGACAACCCCTAGCCMGAAACAGGAACMGAA A3 
GGATKCTGTCGGACAUn(GTTTACGATAACG M j " r ' j " lU *CA j 0 j n 



XTACXKCYC KKCPSEETTPSXKQEXK> 



A4 



gag 466-495 (32) 251 ° 252 ° 2S3 ° 254 ? 25S ° 256 ° 1 



AGACAAAGAACWCTACCCCCCTTYAGCCAGCCTCAAGTCCCTGTTTGGCAATGAdAATTTCAATATC 
TCrrGTTTCTTGWGATGGGGGGAARTCGGTCGGAGTTCAGGGACAAACCGTTACTQTTA^ 

DKEXY PPXAS LK S l fgnd'nfn MWKNX> 



2570 env 91-120 (143) 2600 2610 2620 2630 2640 
tggtggajicagatgcamgaagacrttatctcactatgggacc^ 

accacctkgtctacgtkcttctgyaatagagtgataccctggtttcggagttxtggaacgcagttcjgagctg^ 
mvxqmxeoxi slwdqslkpcvkldvgd> 

2650 2660 pol 256-285 (51) 2690 2700 271 ? 2720 

GCCTATTTCTCCGTGCCTCTGGATRAARRCTTCAGAAAGTATACCGCTTTCACAATC 
CGGATAAAGAGGCACGGAGACCTAYTTYYGAAGIXTTTTCATATGGCCAAAGTGTTAGGGATCGT^ 
AY FSVPbDXX PR KYTAFTI PSXNNE , QL> 



2730 2740 2750 pol 751 -780 (84) 278 ° 279 ° 



2300 



GAAAGGCGAAGCCATSCATCCCCAAGTGRATTGCarACCAGGCATTTGGCA^CTGGATTGCACACACC^CCACGGAAACR 
CTTTCCGCTTCGCTASGTACCGGTTCACYTAACGAGTGGTCCGTAAACCGT^ 

K G E A X HGQVXCS PGIWQLDCTHLEGK> 

2810 2820 2830 2840 pol 1 66"1 95 (45) 2870 2880 

TTATqCCTAAGCT'CAAGCAATGGCCTCTGACAGAGGAAAAGATTAAGGCTCTGACTGMGATT^ 
AATAdGGATTCCAGTTCGTTACCGGAGACTGTXrrCC^rrTCTAATTCCGAGACTGACKCTAA^ 

X I PKVK QWPLTEE K I KALTXIC X EME X> 

FIGURE 15 (Cont) 
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28so 2900 2910 po | 33-1 .360 (56) 294 ? 295 ? 296 ° 

GAGGGAAAGArrAGtjATGGATGACCiCTACGTLGGei'CCGACCTGGAGATTGGCCAACATAGGBCCAAAATCGAAfiAGCT 
CTCCGrTTCTAATCqTACCTACTGGAGATGCAGCCGAGGCTGGACCTCTAACCGGTTGTATCCYGGTTTTAGCTTCTCGA 
EGKIS 1 MDDLyVGSDLEIGOHRXKIEEL> 

2970 2980 2990 3000 p(j| 616 . 645 (75) '030 '040 

CAGGSMACACCTCCTGARATGGGGflCTCACCGAMACCACAAACCAAAAGACTGAGCTCCAMGCTATCCAWCTGGCTCTGC 
GTCCSKTGTGGAGGACTYTACCCCiIgAGYGGCTKTGGTGTT^ 

rxh llxwg'ltxttnqkt ELXAIXLA L> 

3050 3060 3070 3oeo 3090 pol 796-825 (87) 312 ? 

AAGACTCCGGCTYACAGGTCAACATTGTGArACAqATTC 
TT^TGAGGCCGARTCTCCAGTTGTAACACTGTCTGTAAGGGCGA 

Q *D S GXEVN I V TD I PAETGQETA Y F X I# K> 

3130 3140 1150 3160 3170 3180 3190 3200 

* * * , 

CTGGCTGGC AGATGGCCTCTGAR AR YCATTC AC AC AGAC AATGGdU^ 
GACCGACCGTCTACCGGACACTYTYRGTAAGTGTGTCreTTACCdTCCTGTTTCTA^ 
LAGRWPVXX'IHTDNG , RTK IEELRXHLL> 



pol 346-375 (57) 



3240 3250 3260 3270 3280 



CARATGGGGCTTCACAACCCCTGACAAAAAGCATCAGAAAGAGCCTCCCTTTCT( jEGp^G 
GTYTACCCCGAAGTGTTGGGGACTGTTT^CGTAGTCTTTCTCGGAGGGAAAGA(}2^X^ 



3 A4 

3 GTCAAG AAACTG AC AG AG G f*? 
^CAGTTCTTTGACTGTCTCC JOin 
G F T T P D KKHQKEPPFLSSVKKLTE> A5 

3290 vif 166-192 (111) 332 ° 333 ? ^spacers 3360 | 

ATARGTGGAACRAACCCCAGAAAAYCAAGGGACRCAGACRAAATCACACAATGAATGGCCA1 GCTGCC \CAGAGTCCCAG 
TATYCACCTTTGYTTGGGGTCTTTTRGTTCCCTGYGTCTCYTTTAGTGT^TTACTTACCGGT^ CGACGG pgtctc a gggt c 
D X WNX PO K X K G X R X NH T M NG H [_A AjT E S Q> 



3370 



3380 env 435-464 (165) 341 ° 



3420 3430 3440 



AATC A GC AAG AC AGAAACGAAMAGG AMCTGC TGGMGCTCGAC AAATGGGCAAGCCTCTGGA ATTGGT TT RACATT AS Cp A 
T^AGTCGTTCTGTCTTTGCTTKTCCTKGACGACCKCGAGCTGTTTACCCGTTCGGAGACCTTAACCAAAYTGTAATSGpT 
NQQDRNEXXLLXLDKWAS LWNWFXIXD> 

3450 3460 3470 gag 1 21 -1 50 (9) 350 ° 3S1 ? 352 ° 

CACCGGAARTAGCTCCMAAGTGTCCCAGAATTACCCTATCGTCCAGAATSYCCAAGGCCAAATGGTCCACCAASCCMTCT 

GTGGCCTTYATCCAGGKTTCACAGCGTCTTAATGGGATACCAG^ 

TGXSSXVS0NYPIV0NXQG0MVHQXX> 

3530 3540 3550 3560 env 480-509 (168) 359 ? 360 ? 

C CC CC AG^IcTC RTCGG ACTG A G AATCR TTTTCGCTGTGCTC AG C ATT RTC AAT AGGGTC AGGC A AGGC T AT AGCCCTCTG 
GGGGGTCi£aGYAGCCTGACTCTTAGYAAAAC<GACAC^^ 

S p R » L X G L R I X F A V L S I X N fi V R 0 G Y S P L> 

3610 3620 3630 3640 3650 V jf 1 06-1 35 (1 07 3 680 

***** * 

TCC TTCC AAACCCTCM YcjcTC ATCC ATCTG Y A WTA CTTTG A CT GTTTC K CTG A CTCC RCC ATTAGG AG A GCC ATC C TGGG 
AGGAAGGTTTGGGAGKRaGAGTAGGTAGACRTWA'PGAAACTGACAAAGMGACTGAGGYGGTAATCCTCTCGGTAGGACCC 
S F Q T L X L. I HLXYFDCFXDSXIRRAI LG> 

3690 3700 3730 3720 3730 3740 3750 3760 

* " * * 

ACASAIttGTGAGMAGGAGATGCGAATAcjGCTGTGGGAMTCGGA 



TGT STMTC ACTC KTC CTCT ACGCTT ATC 
XXVXRRCEV 



CGACACCCTKAGCCTCGGTACWAGRAACCGAAAGACCCACGGCGACCGAGGT 
AVGXGAMXXGFLGAAG S> 



env 300-329 (156) " 3 ? 38 °? 3 "? 3 "? 383 ? ,M ? 

CCATGGGCGCTGCCTCCATSACACTGACAGTGCAAGCcjTATGACCCTAGCAAAGACCTCRTTC 
GGTA^CCGCGACGGAGGTASTGTGACTGTCACGTTCGdATACTGGGATCGTTTCTGGAGYAACG 

T^M GAASXTLTV Q A 1 Y D P 5 K D LXAE 1 Q K Q> 
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pol 466-495 (65) 3870 3890 3850 3900 3910 3920 

GCTCAGGRTCAGTGGACATWTCAGATTTWCCAA^^ 
CCAGTCCYAGTCACCTGTAWAGTCTAAAWGGTTCTCGGAAAGTTT 
GQXQWTXQI X Q E P FKn'gTVLVGPTPVN> 

3930 DO | 121-15(M42^ 3960 3970 3980 3990 4000 

catcatcggaaggaacmtgc^tgacacagmttggcygcaccctcaactttcccatt^ 
gtagtagccttccttgkacgactgtgtckaaccgrcgtgggagttgaaagcg^^ 

i igrnxltqxgxtlnfpis , kgspaxf> i 

4010 4020 doI 301-330 (54) 4050 4060 4070 4080 

* \/ » * * * A5 

AGTC C AGC ATGMC AMA GATTC TGG AGCCTTTTA GG AWAMAA AACCCTG AS ATGGTC ATCTATC AGTA1 ^^^^^p CCTCTG . . 
TCAGGTCGTACKGTKTCTAAGACCTCGGAAAATCCTVrrKTTTTGGGACTSTACCAGTAGATAGTCATI GG3S&6X GGAGAC J" ,n 

— A6 



SSMXXI LEPFRXXNPXMVIYQYPSPL> 

4090 4100 4110 nef 136-165 (188) 4140 4150 4160 

ACATTCGGATGGTGTTTCAAACTGGTCCCCGTGGACCCCAGSGAAGTGGAAGAGRyCAACRAGGGCGAAAACAATTGCC^ 
TGTAAGC CTAC C ACAAAGTTTG ACC AGGGGC ACCTGGGG TC SCTTCACCTTCTCYRGTTG YTCCCGCTTTTGTTA AC GG A 
TFGWCFK I, VPVDP XEVEEXNXGENNC L> 

4170 4180 4190 4200 pol 271-300 (52) 423 0 4240 

CCTaTTTAGGAAATACACAGCCTTTACCATTCCCTCCAYCAATAACGAAACCC 

GGAq AAA TCCTTT ATGTGTCG GAAATGGTAAGGGAGGTR GTTATTGCTTTGGGG ACCGTAATCCATAGTCATATTGCAGG 
L FRKYTAFTI PSXNNETPGIRYQYN V> 

42j>o 4260 4270 4280 4290 en V 31 5-344 (1 57) 4 320 

.»**•♦ * 
TGCCT^AGGGATGuGGAAGCACAATGGGAGCCGCCAGCATKACCCTCACCGTCCAGGCTAGGCWACTGCTCAGCGGAATC 
ACGGAGTCCCTACqCCTTCGTGTTACCCTCGGCCGTCGTA^ 

L P qgw'gs TMGAA SXTLTVQ.ARX LL SG I> 
4330 4340 4350 4360 4370 n 0 | 451-480 (64) 4400 

GTCC AGC AACAG A RC AATCTG CTClG MGG AGAATAG GG AA A TCCTC ARAG AG CCTGTGCA TGGCGTCTACTA CG ATCCCTC 
CAGGTCGTTGTCTYGTTAGACGAuCKCCTCTTATCCCTTTAG^AGTYTCTCGGACACGTACCGCAGATGATGCTAGGGAG 
VQQQXNL l'xENR E I LXEPVHCV Y Y DP £> 

4410 4420 4430 4440 4450 ypU 61-81 (136) 4480 

• • * . » * * 

CAAGGATCTGRTCGCTGAARTCCAAAAGCAAGGuASAGAGGAACTGTC 
GTTCCTAGACYAGCGACTTYAGG^*TTTCGTTCC(fTSTCTCCTTGACAGX»YG 

KDLXAEX 0 K O G X £ ELSXXVDMGNYD L> 



spacers 45io 4520 4530 vpr 61-90 (116) 456 ° 



G V C N N L 



A A 



GAGTGGACAATAACCTC GCCGC7 ATTAG AA YCCTGC AAC AGC TCMTGTTCRTTC ACTTTAGGATTGGCTGC C R GC ACTCC 
CTC ACCTGTTATTGGA< CGGCG* TAATCTTR GGACGTTGTCGAGK ACAAGYAAGTGAAATCCTAACCGACGGYCGrGAGG 
XRXLQQLXFXHFRXGC XHS> 



4570 4580 4590 4600 4610 g 3 g 406-435 (28) 4640 

AGGATTGGCATCMYCCGTCAGAGAAGGGSCAG; GCTCCCAGGAAAAAGGGATGCTGGAAGTGTGGCARAGAGGGACACCA 
TCCTAACCGTAGKRGGCAGTCTCTTCCCSGTC1 CG AG GGT^TCTTTTTCCCTACGAC CTTC ACACCGT YTCTC CCTGTGGT 
RIG1XRQRRXRA PRKKGCWKCGXE GH Q> 

4650 4660 4670 4680 4690 4700 4710 4720 

* . » * * * • * 

GATGAAGGATTGCACTCAGAGACAGGXTTAACTTTCTGGGAAA 
CTACTTCCTAACGTGACTCTCTGTCCGATTGAAAGACCCTTTcJcTWCGGTCTC 

MKDCTERQAN FLGK'XARLXIXTYWGL> 

vif 61-90 (104) 47s ? 47£ ? 477 ? 47a ? 479 ? 4B0 ? 

ATACCGGTXSAGAGAGACTGG^ASCTCGGCCAWGGCGTCAGCATTGAGTGGAGGA 

TATGGCCACTCTCTCTGACCGTSGAGCCGGTWCCGCAGTCGTAACTCACCTcdTRTTCCCTTTCCCGACTCCTATCGCCG 
HTGERDHX LGXGVS IEWr'xR ERAEDSO 
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vpu 46-75 (135) 4830 4840 4850 4860 4870 4880 



S2io 522ol 5230 5240 52so gag 390-420 (27) 528 ? 

gagccgasa'ga'cagccaacaacctccagc|tgtttcaattgcggcaaacagcgacacmttgccaraaactgtagggcccct 

C TCGGCTSTCTGTCC CTTGTTCG AG GTCu AC AAAG TT AA C GCCGTTTCTCCCTGTGKAACGGTYTTTGAC ATCCCGGGGA 
G A X R QGTS SS'C F NCG K E G HX A X NC R A P> 

5290 5300 5310 5320 5330 5340 5350 5360 

CGCAAGAAAGGTTGTTGGAAATGCGGAARGGAAGGCCAT|CAAATGAAAGACTC^ 
GCGTTCTTTCCAACAACCTTTACGCCTTYCCTTCCGGTJfc^ 
RK KGCWKCCXEG H Q M KDCTE RQAN FL G> 

gag 421-450 (29) 5390 5400 5410 5420 5430 5440 

caaaatctggccctcchrcaaaggcagacccggaaactttcycca^ 

gttttagaccgggaggkygtttccgtctgggcctttgajvagrggtttcqttkaccgagaccatatagt^ 

K Z WP S XKGR PGNFXQ S'XWLWY I K I FI> 

^50 env 465-494 (167) * 480 549 ° 5500 . 551 ? SS2 ? 

TGATCCTCGGTGGACTGRTTGGCCTCAGGATTRTCTTTGCCGTCCTC 

ACTAGCAGCCACCTGACYAACCGGAGTCCTAAYAGAJVACGGCAC^ACAGGTAGYAATT(jcCTCGGCRCTCGGYTCTGGAG 
MIVGGLXGLRIX FAVLSIXN'gAXSXDL> 



5530 



5540 nef 31-60 (181) ss ™ 558 ° spacers 



GATAAACATGGCGCTMTTACAAGCTCCAATACCSCTGCCAATAACSCTGACTGTGYCTGGCTGRAGGC'i GCTGCC ATGAC 
CTATTTGTACCGCGAKAATGTTCGAGGTTATGGSGACGGTTATTGSGACTGACACRCACCGACYTCCGJ CGACGG TACTG 
DKHGAXTSSNTX ANNXDCXWLXA |_A Aj M T> 



AACGAAAGCGAAGGCGACASAGAAGAGCTCAGCRCAWTGGTWACATGGGCAATTACGATCTG ^gjg CCTGCCCCCAG A6 
TTGCT'rTCGCTTCCGCTGTSTCTTCTCGAGTCGYGTWACCACCTGTACCCGTTAATGCTAGAC^^MiG^GGACGGGGGTC j 0 j n 
NESEGDXEELSXXVDMGNYDLSS P A P R> 9 ^ 

4890 enV 510-539(170) «20 4930 4940 49SO 4960 I 

GGGACCCGATAGGCYGGRGRGAATCGAAGAGGAAGGCGGAGAGCRAGRCAGAGRCAGAAGCGTCAGGCTCGTGARTGGqA ' 
CCX'TGGCCTATCCGRCCYCYClTAGCTTC'l^CTTCCGCCTCTCGYTCYGTCTCYGTCTTCGCAGTCCGAGCACTYACCdT 
G PD RXXXI E EEGGEX XRXRSVRL V X G> 

4970 4980 nef 1 51 -1 80 (1 89) 501 ° 502 ° 503 ? 504 ? 

GWGAGGTCGAGGAARYCAATRAGCGAGAGAATAACTGTCTGCTCCACCCTATSRGTCWACATGGCATGGAAGACGAAGAS 
CWCTCCAGCTCrTTYRGTTAYTCCCTCTCTTATTGACAGACGAG^ 

X EVEEXNXGEN.NCLLHPXXX HGMEDEX> 

5050 5060 5070 pol 961-990 (98) 510 ? 511 ° 512 ? 

agagaggtJaatagcgatatcaaagtggtccccaga^ggaaagccaaaatcattagggattacggaaagca^ 
tctctccac ttatcgctatagtttcaccagg^gtcttcctttcggttttagtaatcccta^tgcctttcgtttaccgacc 
rev'nsdikvvpr bkakiirdyckomao 

5130 S140 5150 5160 Do! 16-45 (35) 5190 5200 

C GMTGACTGTGTGGCC RGGTTC Y CTTCCG AGC AAAC ARGGGCTAACTCC YCTRCAAGC AG AAAGCTGGG AG ACGG AGGCG 
GCKACTGACACACCGGYCdAAGRGAAGGCTCGTTTGTYCCCGATTGAGGRGAYGTTCGTCTTTCGACCCTCTGCCTCCGC 
Tt nrvAx'FXS EQTXAN SXXSRKLG DGG> 

I 



5610 5620 5630 V pU 1-30 (132) 5 "?" 56? ? 

ACCCCTGGAGATCATCGCTATCGTCGCCYTTATCGTCGCCCTCATCMTAGCCATTGTGGTCTGGACAATCGYCTVCATTG 
TGGGG ACCTCTAGTAGCGATAGC AGCGG R AATAGC AGCGCG AGTAGK ATCGGTAAC ACC AG AC CTGTTAGC RG A WGTAAC 
PLEIIAIVAXIVALI XAIVVWTIXXI> 

5690 5700 5710 5720 pol 1 36-1 65 (43) S1SQ S1 ™ 

* . * * * 

G^SSsi^AATMTGCTCACCCAAMTC^GGAYGCACACTCAATTTCCCTATCTC join 
X=*Gel^l rTAKACGAGTGGGTTKAGCCTRCGTGTGACTTAAAGGGATAGAGGGGGTAACT.STGTCACGGACACTTT R1 
foffr^^H otcojxTVPV K> ° ' 

I 



t 

A7 



AGTA 



EYVENXLTQ 
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s-no spacers seoo ssio env 255-284 (153) 5840 

« i r^- * * 



* i * 

CTGAAACCCGGAATGGATGGC GCCGCC AYCTTTAGGCCTCGCGCAGGCRATATSARAGACAATTGGAGAAGCGAACTGTA 
GACTTTGGGCCTTACCTACCC CGGCGC rR GA AATC CGGACC GCCTCCG Y TATA STYTCTGTTAACCTCTTCGCTTG AC AT 
L K P G N D G A A X FRPGGG-XXXDNWKS EL Y> 



5B50 5860 5870 5880 5690 5900 5910 5920 

TAAGTATAAGGTtGTGRAGATTTlAGCCTCTGGGARTC|ACATGG 

ATTCATATTCC AGCAC YTCTA A YTCGGAGACCCTY ACIIXH'ACCTA AG^OCTT ACCCTCAAGCAGTTGTGTGGGGGTGAC C 
K Y K V V X I XPLGx'tWI PEWEFVNTPPI>> 

pol 556-585 (71) 5 ' 6 ? 5 "? s "1 S "° 6 °°? 

TCA agctatggtatcagctggagaaagascctat«^ 

AGTTCGATACCATAGTCGACCTCTTTCTSGGATAG^ 

vklwyqlekxpi xgxe , pqdlnxmlnxv> 



6010 gag 181-210(13) 



6040 6050 6060 6070 6080 



GGAGGCCATCACGCCCCTATGCAAATGCTGAAAGASAC^ 
CCTCrOTTACTCCGGCGATACGTTTACGACTTTCT^ 

gghqaamqhlkxtineeaa , vlfldgix> 



6090 6100 pol 706-735 (81) 6130 6140 6150 6160 

CAJ^GCTCAAGAGGAACATGAC^RGTATCACTCCAACTGGAG^ACA^ 

gtttcgaotctccttctactcxycatagtg^ccttcacctcct^ 

kaoeehexyhsnwrtmaxxfnl , xkhx> 

6170 6ieo 6190 gag 31-60 (3) 6220 623 ° 624 ? 

tctgi^ctctagggagctggagagattcgctctgaatcccrgcctgctg^ 

agacccggagatccctcgacctctctaagcgagact™gggycggacgacctctgthcgct^ 

vwasrelerfalnpxlletxegcxqt'a> 

6250 6260 6270 6260 en V 215-244 (151) 631 ? " 2 ? 

gagcaagagattatcattaggtccgagaatytcacaracaatgycaaaaccattatcgtccaw 
ctccttctctaatagtaatccaggctcttaragtgtytgttacrgttttggta^^ 

BEE I I I RSENXTXNXKTIIVXLNXSVX> 
6330 6340 6350 6360 6370 gag 1-30 (1) 640 ° 



* * * 

gattaacUtgcgcgctagggctagtgtcctcagkggcgccragct^ 
»aatt*acccgcgatcccgatcacaggagtckccgccgytcgacctgcggacccttttct 
in 1 hcarasvlxggxldawexirlrpg> 

6410 6420 6430 6440 6450 net 91-120 (185) 6480 

GAAAGAAAAAGTATAGqCTCAAGGAGAAGGGAGGCCTGGASGGACTGRTTTACTCCMAAAAGAGG 

ctttcit-it^atatccJgagitcctcttccctgcx;gac^ 

gkkkyr 1 l kekggcxglxysxkrqxild> i 

6490 6500 6510 6520 6530 6540 6550 6560 I 

» * * ■ * B1 

CTGTGGGTGTATMACACACAGGGATTC gqgggj -jtSGGGAACCWTGATCCTCGGCWTGGTGATKATCTGTAGCGCCACCGA 

gacacccacataktgtgtgtccctaag^^^accccttggwactaggagccgwaccactamtagacatcgcggtcgct 
lwvyxtqgftrw gtxi lgxvxicsasx> «<f 

1C AK MO o\ 6590 6600 6610 6620 6630 6640 i 

env 16-45 (138) , . . * * T 

SAATCTGTGGGTGACAGTGTA1TACGGAGTGCCTGTGTGGAG<^ 

sttagacacccactgtcacataatgcctcacggacacacctcutctgwcgaggacaggccgtaacacgttgtcgtttyat 
nlwvtvyygvpvwr , rxllsgivqqqx> 

6650 env 330-359 (158) S6S ° 670 ? 671 ? 6?2 ? 

acctcctgagggctatcga^gcccaacagcatctgctccagctcaccgtctgJgtcagg 

TGGAGGACTCCrGATAGCT-rCGGGTTGTGGTAGACGAGGTCGAGTGGCAGAcdCAGTCCGTAAAGGGGTCCGGAACCGAG 
NLLRAI E ACOHLLQLT VW l V/RHFPRPWL> 
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vpr31-60(114) 67S ? 6 "? 67e ? 679 ? 68 °? 

CACRRCCTGGGACAGyACATC^ATOAGACATACGGAGACACATCGW^GGCAGTCGAAGCCCTq^GCCCTCATCA^4ACC 
GTGyYGGACCCTGTCRTGTAGATACTCTGTATGCCTCTGTCTACCMKCCCTCACCTTCGGGAGjTKTCGGGAGTAGTKTGG 
HXLGOX IYETYGDTWXGVEAL'XA L I X P> 

6810 vif 151-180(110) 684 ° 685 ? 686 ? 6n ™ 688 ? 

CAAAAAGATTARGCCTCCCCTCCCATCCGTGAAAAAGCT^ACC^ 
GTTTTTCTAATYCGGAGGGGAGGGTAGGCACTTT-TTCGAGr^ 

K KI X PPLPSVKKLTEDXWNXPQK X 1 Y S> 

6890 690o po! 901 -930 (94) 693 ! 694 ? 695 ? 696 ? 

CTGGCGAAAGGATTRTCGATATCATTGCAWTCGACATTCAGAC 

G AC C GCTTTCCT AA Y AGC TATAGT AACGTWGGCTGTA AGTC1GATTCCTTGACGTTTTS GTTTAG K RTTTCT AAGTCTTA 
AG ER I X D I I A X D I QT K E L 0 X Q I X K I Q N> 

6970 6980 6990 pol 886-915 (93) . 7020 . 703 ° 7 ° 4 ? 

» • * * 

ttcJgctctgtttatccataactttaag^ 

AAGCGACACAAATAGGTATTGAAATTCTCCTTCCCTCCGTAACCGCCGATGAGGCGGCCTCTCTCTTAGYAvACTG 
F ' A VFIHNFKRKGG ICG YSAGERIX DI I> 

7050 7060 7070 7080 gag 256-285 (18) 7I1 ? 7a2 ° 



CC^CASCGATATcIrTTCCCGTCGGCGAWATCTATAAGAGATG^ 
CCGGrrSGCTATAdYAAGGGrACXrCGCTWTACATATTCTCTAC^^ 

AXOI^XPVGXI YKRWI I LGLNKIV R M Y> 

7130 7i4o 7150 7160 7170 env 495-524 (1 69) 720 ° 

kacccgtcagcat*tctggatat<Iagagtgagacagggatactcccccctcagctttcagacactgmygcccgctcccaga 

KTGGGCACTCGTAAGACCTATAJrCTCACTC'rG-rCCCTATGAGGGGGGAGTCGAAAGTCTGTGACKRCGGGCGAGGGTCT 
X P VSILD I 1 R V R QGY SP L SFQTLXP A P R> 

7210 7220 7230 7240 7250 7260 7270 7280 

* I 

GGCCCTGACAGACYCGRASGCATTGAGGAAGAGfTCCAGSCAGGACCATCAGTATCCCATTYCCGAACAGCCTCTGYCTCA 
CCGGGACTGTCTGRGCYTSCGTAACTCCTTCTdAGGTCSGTCCTGGTAGTCATAGGGTAARGGCTTGTCGGAGACRGAGT 

gpdrxxxieee'sxodhqypixeqp l, x q> 



tat 61-90 (122) 



7310 7320 7330 7340 73S0 7360 



GMCAAGGGGAGRCAATCCCACAGRCCCTRAGGAAAGCAAAAAC| 
CKGTTCCCCTCyGTTAGGGTGTCYGGGAYTCCTTTCGTTTTTCg^^^ 

X RGXNPTXPXESK KAS GVVESHNK E L>> g 3 



p^GGAGTGGTCCACTCCATGAATAAGGAACTGA B2 
^'gCCTCACCAGCTCAGGTACTTATT CCTTG ACT jOIR 



7370 pol 856-885 (91) 7 * 00 . 741 ? 7 * 2 ? 7 °? 744 ° 

AJUU^GATTATCGGACACGTCAGGGAMCAGGCTGAGCACCTC 
T-rTTCTAATACCCTCTCCACTCCCTKGTC 

K K * I IGQVRXOA EH LKTAVQM'AAMQM L K> 



* 



74S0 7460 gag 196-225 (14) 7490 



7500 7510 7520 



GAWACCATTAACGAAGAGGCTGCCGAGTGGGACAGAHTCCATCCCGTCCATGCCGGACCCRTTSCCCCT^TCAC-CGMGAT 
C1*V^GG^AATTGCTTCTCCGACGGCTCACCCTG'ICTYAGGTAGGGCAGGTACGGCCTGGGYAASGGGGAGAGTGGCKCTA 
X T INEEAAEWDRX H PVH AGPXX P'L T X- I> 

7S30 7540 7550 poll 81-21 0 (46) ?580 7 ^° ^00 

TTGTAMAGAAATGCAAVAAGAAGGC^ 

AAC A TKTCTTTACCTT BTTCTTC CGTTTT AG AGGTY CT AAC CGGG ACTCTT A GGG AT ATTGTGTGG GY AGAAAC GGT A/J G 
CXEMEXEGK1SX I GPENPVNTPXF A J> 

7610 7620 7630 7640 po! 871-900 (92) 7670 768 ? 

• * * * * 

/> AGTGAGAGASCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGTC^TTCATTCACAATTTCAAAAGGARAGGCGGA 
Tl'CACTCTCT^GTTCGG^TT'GT^GAGTTCTGTCGGCAGGTCTACCGTCAGAAGTAAGTGTTAAAGTTTTCCTYTCCGCCT 
< 2 VRXQAEHLKTAV0MAVFIHNFKRXGG> 
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7690 



7700 



7730 



pol 211-240 (48) 



7740 



7750 



7760 



ATCGGAC^JaAAAAGAAAGATAGCACAAAGTGGAGCAAACTGGTAGAC^ 

TA GCCTCC GT'J'TTTCTTTCT ATC GTGTTTCACCTCCTTTG ACC A TCTG AAATCCCTCG AGTTGTTTGC ATGTGTCCT AAA 

_ I _ - _» ... w. «, t u r%TTO C t M V O «P n n F" 



7790 



7600 



DFRELNKRTQ D F> 
7830 7840 



env 540-569 (172) 

CTGGGAGGTCCAGCTCGGcjT'rrJ^GGCTCTGGCTTGGGATGACCTCAG^AGCCTGTG^^ 

G^cCCrcCAGGTCGAGCCOAAAARCCGAGACCGAACCCTACTGGAGTCCTCGGACACAGACAAGTCGATACTGTCTGACT 

wevoi*g'fxalawddlrs"i. clfsyhrl> 



7850 



7670 



7880 



™so vpr 76-96 (117) 



7920 



GAGACYTTATCClXTATCGyT^CCAGAAYcjTGCCRACATAGCAGAATCGGCATCACTAGGCAACGTAGAGSTAGGAACGGC 
CTCTGRAATAGGAGTAGCRACGGTC^rTRClACCGY^ 

RDXILIXARX T CXHSRIGITRQRRXRNG> 



spacers 



7S50 



7960 



7970 env 155-184 (147) 



8000 



KCCTCCAGGTCCjGCTGCC CCCAAARTCVrcCTTXTGAMCCCATTCCCATTCACTATTGCGCTCCCCCTGGCTT^GCTATCCT 
MGGAGGTCCAGC CGACGf GGGTi^AGWGGAAGCTKGGGTAAGGGTAAGTGATAACGCGAGGGCGACCGAWGCGATAGGA 
XCRS|AA 1 -~""- v ' 1TorUV '~ ,lPACXAIL ' : 



8010 



PKXXFXPIPrHYCAPACXAIl>> 
8020 8030 8040 8050 8080 



vif 76-105 (105) 

CAAGTGTAACRATAACAMhTTTCAATGGCjGAAARGGATra 

GTTCACATTGYTATTCTKKAAGTTACcdci^YCXrrAACCGTWGACCCTOTSCCTCACA^ 

_ _ I „ „ ... ~ ■ r~ /- \r tr T v w R V \Xr> 



CNXKXFNG 



8090 



8110 



XDWXLGXGVSIEWRXK> 
8120 8130 ~ ao* Aac* 8160 



gag 481-499 (33) 



t 



G STATAGCAC ACAGCTGG ACCCTG RCCTCGCCGATCAl 
CS ATATCGTGTGTCC A C CTGGGAC YGG A GCGGCTAC"*' 



JcTCTATCCl^CCTYAGCTTCCCTGAAAAGCCTCTTC 
yGAGATAGGAGGGARTCGAAGGGACTTTTCGGAGAAG JOin 



XYST0VDPXLAO0PSLyPPXASLKSLF> 

8 i7o spacers B200 82io vif 121-150 (108) 8240 



B3 
join 
B4 



ggaaacgatccctyatccca; 

AG 

x s o 

82S0 8260 



: \AACGATCCCTYATCCCAJJGCCGC-l AG AAGGGCTATCCTCGGCC AWAK AGTC AG S AG A A GGTGTGAGTATCKGKCCGG 
CCTTTGCTAGGGARTAGGGn CGGCGA rCTTCCCGATAGGAGCCGGTWTMTC AGTC STCTTCCACACTC ATAGKCMCGCC 
G N D P ~ * ^Ix » r»DATt. RXXVXRRCEYXXG: 



RRAILGXXVXRRCEYXXO 
8270 8280 8290 8300 8310 8320 



ACACAATAAGGTCGGCTCCCTCCAATACCTCGCACT^ 
TGTGTTATTCCAGCCGAGGGACGTTATGGAGCGTGA(3TCGGT^ 

HNKVGS LOY LAlJsQPXTACXKCYCKlO 



8400 



tat 16-45 (119) 835 ? 836 ? 837 ? pol 976-995 (99) 

GTTGCTWC C AC TGTC A GSTCTGCTTC C TG AMGAAGGGACTGGC A AT<|aGGG ATT ACGGAAAGC AAATGGCTGGCGMTG AC 
CAACGAVTCCTGACAGTCSAGACGAAGGACTKCTTCCCTGACCCTTAdTCCCTAATGCCTTTC 



CQXCFLXKGLGI 

spacers 8440 



RDYG K Q M A G X D> 

8450 pol 721-750(82) 8480 



C C X H 
8410 

• — — 

TGTGTGGCCRGCAGGCAAGACGAAGAC GCAGCC AAGTACC ATAGC AATTG G A G AACC ATGGC C A RTGA STTT AAC CTCCC 
ACAC ACCGGYCGTCCGTTCTGCTTCTC CGTCGC TTCATGGTATCGTTAACCTCTTGGTACCGGTYACTSAAATTGGAGGG 
CVAXRQDE D |_A_aJ KYHSNWRTMAXXFMLP> 



8490 



8500 



8530 



8S20 



8530 



8540 



8550 



8550 



C rr T ATCGTCSCTAAGGAAATCGTCGCAWRTTGCGATAAG^ 

gggatagcagsgattcctttagcagcgtwyaacgctattcac/Jttgc 

PIV XK EIV AXCDKC I NEWXLELLEELK> 
Vpr 16-45(113) B590 8600 8610 8620 8630 8640 

AWGAAGCCGTGAGACACTTTCCCAGACCCTGGCrTGCATGGCCTC 
tWtCGGCACTCTGTGAJ^GGGTCTC^ 

„ \. « y, o u fprpwLHGLGOH , DXISLWD0S> 
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8650 env 106-144 (144) 8680 8690 8700 8710 8720 

CTGAAACCCTGTGTGAAACTGACACCCCTCTGCGTCACCCTCAACTGTACCAATGCCAATCTQMWGAAGAGMTACTCCAC 
GACTTTGGGACACACTl^GACTGTGGGGAGACGCAGTGGGAGTTGACATGGTTACGGTTAGA<JKV«rTTCTCKATGAGGTG 
LK PCVK LTP LCVT LNCTN ANl'xKX Y S T> 

8730 8740 Vlf 91 -1 20 (106) 8770 8780 8790 8800 

CCAAGTGGACCCCGRTCTGGCTGACCAWCTGATTCACCTCCACTATTTCGATTGCru w rKCCGATAGCRCAATQCATCCCA 
CCTTC ACCTGGG GC Y AG ACC G ACT GGTWGA CT AAGTGG AGCTG AT AAAGC TAACG AAAMGGCTATCGY GTTAQGT AGGGT 

qvd pxladxlihlhyfdcfxdsxi'h P> 

0810 8820 8830 net 1 66-1 95 (1 90) sseo 88?o ssso 

* • * * *. * 

TSRGCCWACACGGAATGGAGGATGAGGAWAGGGAAGTGCTGAWATGGAAATTCGATAGCCRTCTGGCTCKCACGCATATS 
ASYCGGWTGTGCCTTACCTCCTACTCCTWTCCCTTCACGACTVrTACCTTTAAGCTAT 

XXXHGM EDEXREVLX WKFDSXLAXRH X> 
8890 8900 8910 8920 poM51 -1 80 (44) 8550 8960 



I^^^CCTATCGAWACCGTCCCCGTCAAGCTCAAGCCTGGCATGGACGGACCCAAAGTGAAACAGTGGCCCCTCAC 
^S^^ GGATAGCTWTGGCAGGGGCAGTTCGAGTTCGGACCGTACCTGCCTGGGTTTCACTTTGTCACCGGGGAGTG join 
"if S PIXTVPVKLKPCMDCPKVKOWPLT> B5 



8970 



8980 8990 9000 9010 gag 436-465 (30) 9040 ^ 



CGAAGAGAAAATCAAAGCQATTTGGCCTAGCMRCAAGCGAAGGCC*rGGCAATTTC^ 
GCTTCTeTTTTACrrrTCGqTAAACCGGATC 

EEKI KA^IWPSXKGRPGN FXQSXPEPT> 

1 

9050 9060 9070 9080 9090 vif 31-60(102) 9120 

CACCCCGAGCCGAGARCTTTRGATTCGG(^TTAGCAAAAAGGCTAASGGATGGTTTTACAGACACCATTWCGAWAGCCRA 
GTGGGGGTCGGCTCTY G AAA YCTAA GCCOTA ATCG TTTTTCCGATT S C CTAC C AAAATGTCTGTGGT AA WGCTWTCG GYT 
appaexfxfg'i SKKAXGWFYRHHXXS X> 

9130 9140 9150 9160 9170 9180 9190 9200 

» * » * » 

CACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCCGaATGATGACCGCTTGCCAAGGCGTCGGCGGACCCRGTCACAA 
CTGCGATTCCACTCGAGGCTCCAGGTGTAAGGCCAGCCOTACTACTG 
HPKVSS EVH I PLC'MMTACQGVGGPX H K> 



gag 346-375 (24) 9230 9240 9250 9260 9270 



9280 



AGCCAGGGTACTGGCAGAGCCTATGTCCCAGGYGAMCI^CGCTAACATlfc 

TCGGTCCC ATG AC CGT CTCC G ATAC AGGGTCCRCTKGKTG CGATTGTAJiGG AGGGT AAC AC SGGTTTCTCTAACACCGT W 
ARVL AEAMSQXXXAN i'pPI V X K E I V A> 

9290 po! 736-765 (83) «2o 9330 9340 9350 9360 

RCTCTGACAAATGCCAGCTCAAGGGTGAGGCTATKCACGGACAGGTCRACTGTAGCCC'DTCCGACGGAMCAAGACAGRCT 
YGACACTGTTTACGGTCGAGTTCCCACTCCGATAMGTGCCTGTCCACYTGACATCGGGJ|ACG 

XCDKCQLKGEAXHGQV XCSp'sEGXRQ X> 

9370 9380 rev 31-60(126) 94 i° 9420 9430 9440 
* * ' • * * * 

AGGARGAJVCAGACGTAGAAGGTGGCGTGMGAGGCAAAGGCAAATCCRCKCCATCTCCGAGWGGATTCTC ggacagatrag 
TCCTYCTTGTCTGCATCTTCCACCGCACKCTCCGTTTCCGTTTAGGYGMGGTAGAGGCTCWCCTAAGAC CCTCTCTAYTC 
RXNRRRRWBXR0RQIXX1SEXILG0X R> 

9450 9460 9470 gag 226-255 (16) 9500 9518 952 ? 

GGAACCCAGAGGCTCCGACATTGCCGGTACCACAAGCACACTGCAAGAGCAAATCGSATGGATGACAARCAATCCCCC'DR 
CCTWJCGTCTCCGAGGCTCTAACGGCCATGGTGTTCCTCTGACGTTCTCCT^ 

EPRG SDIAGTTSTLQEQIXWMTXNPP^ 

9530 9540 955.0 9560 pol 841 -870 (90) 9598 9608 

RCATTMAGCAAGAGTTTGGCATTCCCTATAACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTCAAG 
YGTAAXTCGTTCTCAAACCGTAAGCGATATTGGGAGTCAGCCTCCCGCAGCACCTTTCGTACTTGTTTCTCGAGTTCTTT 
XIXQE FGIPYNPQSOGVVESMNKELKIO 
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9610 9620 9630 nef 1 06-1 35 (1 86) 9^60 9670 9680 

atcattggcJagacacgagatcctcgatctctgggtctacmatacccaaggciwtttc 

TAGTAACCqTCTGTCCTCTAGGAGCTAGAGACCCAGATGKTATGGGTirCGAWAAA 

iig'rqei ldlwvyxtqgxfpdwxnytp> 
9690 9-700 97X0 9720 rev 46-75 (1 27) 9750 9760 



AGAGMAAGACAGAGACAGATTCRTKCTATTAGCGAAWGGATTCTCAGCAMCTKCC 
TCTCKTTCTGTCT^TGTCTAAGYAMGATAATCGCTTWCCTAAGAGTCGTKGAMGG 



CGGACCCGGARYCAGATAC 

(yCTGGGCCTYRGTCTAT q 

G PGXRYPSRXRQRQIXXISEXILSXX> 

9770 9780 9790 9800 9810 gag 301 -330 (21 ) 9«4° 

* * * J * * 

TCGGCACAYCCGCTGAGCCTGTGCeTCTGCAACTOTOT^ 
AGCCG^TRGGGGACTCGGACACGGAGACGTTGACUWATTCTGTGACTCTC^ 

lgrxaepvplql'xk.tlraeqaxqxvkn* 

9850 9860 9870 9880 9890 9900 9910 9920 

TGGATGACCGA S AC ACTGCTCGTGG AAAACGCTAACCCTG ACTGtJgAGA RAGTGTATCTG KCTTGCCTCCCC GCTC ATAA 
ACCTACTGCCTSTGTGACGAGCACGTTTTGCGATTGGGAC^^ 
WMTXTLLVQNAN PDC'EXVYLXWVPA HK> 

pol 676-705 (79) 99S ? " 6 ? "™ 99S ° s "°. 1000 °. 
aggcattggcggaaacgaacaggtggacaaactggtcakckctggcattaggaa/Kcagaccctaaccctcaggaartcs 

TCCGTAACCGC CTTTGCTTGTCC AC CTGTTTGACC A GTMGMGAC CGT AATCCTTTjTCTCTGGGATTGGGAGTCCTT Y AG S 

gi ggneqvdklvxxgirk'td p n p q e x> 
I iooio env 76-105 (142) 10040 1005 °. 10060 10070 10080 

WTCTGGAAAACCTCACCGAGAACT*7^AACATGTGGAAAAACRATATGGTGGASCAAATGCA 

wagaccttttccagtggctcttcaaattc 

xlenvtenfnmwknxmvxqmxe'ag xa i> 

10090 ioioo env 170-199 (148) 10140 101S0 10160 

* * L 

C TG AAATGCAATR AC AAAAM STTC AACGG AACTGG A C CCTGTAM G AATGTGTCC ASCGTCC AGTGTACCC ATGGCjCWAG A 
GACTTTACGrrAYTGTTTTKSAAGrTTGCCTTGACCTGGGACATKCTTACACAGGTSGCAGGTCAC^TGGG 

lkcnxkxfwgtgpcxnvsxvqcthgxe> 

10170 10180 10190 env 600-629 (176) 10220 1023 ° 10240 

GCTC AAG AWT AGCGCTRTCTCCCTGCTC aacgctaccgctatcgctgtggctg rg k ggac cgataggrttatcg aagtgg 
ccagttctwatcgcgayagacggacgagttgccatcgcgatag^^ 

LKXSAXSLLNAT AIAVAXXTDRXI EV> 

10250 10260 10270 10280 vif 46-75 (1 03) 1031 °. 10320 

VTCA<jTCCCRGCATCCCAAACTGTCCAGCGAAGTCCATATCCCTCrGGGAGASGCTAGCCTCRTCATTARGACATACTGG 
R>GTdAGGGYCGTAGGG-rTTCACAGGTCGCTTCACGTATAGGGAGACC 

XO'SXHPKVSS EVHI PLGXARLXIXTYW> 

10330 spacers^ 10360 10370 nef 1-30 (179) 10400 . 



• r I \^ " 

GCCCTCCASACAGGC GCTGC1 ATGGGCGGT AAATGGTC C AAGWGCTCC C YCGTCGG ATGGCCCGMA GTGAGAG AGAG AAT 
CCGGAGGTSTGTCCC CCACC* T ACCC CC C ATTTACC AGGTTC WCC AGG G RGC A GCCT ACC GGGC K TC ACTCTCTCTCTT A 

glxtga amggkwskxsxvgwpxvrer i> 



B5 
join 
B6 



10410 10420 10430 10440 10450 pol 496-525 (67) 10480 

CACACRGRCASCCCCTGCCCCTGAGGCACTdCTCAAGACCGGCAAG™ 

gtctgycygtscgggacggccactccctcacIgagttctggccgttc^^ A 
rxxxpaaegv'lktgky xrxrxahtnd> ¥ 

10490 10500 10510 10520 10530 10540 10550 10560 B6 

tcargcaactgacacmggytc^tccaaaacattcccacagac 

AGTYCGTTCACTGTCKCCRACACGTTTTCT^ D/ 
V XQLTXXVQKIATESSWEXLKYXXNLb> I 
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10630 10640 



env 585-614 (175) 10590 10600 1061 ? 1062 ? 

CWGT AC TGGGGC C WGGAACT G AAA AWCTCCGC C RTC AGC C TCCTGAATGCCAC AGCfl ATTS WGCTGC CTG AG AAA G AW AG 
GHCATGACCCCGGWCCTTGACTTTTW^^ 
X Y WGX ELKX S A XS.LLNATA'l X LPEK X S> 

10650 pel 391-420 (60) 10680 10690 30700 10710 10720 

CTGGACCGTCAACCATATCCAAAAGCTCGTGGGAAAGCTCAACTGGGC^ 

GACCTGGCACTTGCT AT AGGTTTTC G AGC ACCCTTTCG AGTTG ACCCGTAGGGTCT AAATGS GGCdjTCT CGGT AA CTC C 
WTV. NDI0KLVGKLNWASQIYXG»RAI E> 

10730 10740 e nv 345-374 (1 59} 10770 10780 10790 10800 




10810 10820 10830 pol 631 -660 (76) * 0860 10870 



CTC 
IGAG 
L> 



10880 



GCCCTCCAGGATAG^GGATYGGAAGTGAATATCGTCACCGATAGCCAATACGCTCTAGGCATCATTCWGGCTCAGCCTGA 
CGGGAGGTCCTATCGCCTARCCTTCACTTATAGCAGTGGCTATCGGTTATGCGAGATCCGTAGTTAAGWCCGAGTCGGACT 
ALQDSGXEVN 1 VTDSQYALGI IXAQP D> 

10890 10900 10910 10920 e nv 420-449 (1 64) i°95° 10960 
• * * * 

caraagJgaaagggaaatctccaactataccartcwgatttacrag 

CTYTTCdcTTTCCCTTTAGAGGTTGATAT^ 

X s" E R E I SNY T X X I YXI L TESQNOOD R> 

10970 10980 10990 11000 uoio env 285-31 4 (1 55) 11040 




11050 11060 11070 11080 11090 pol 91 -1 20 (40) 11120 

I * * * 

ATCWTTYTCGGATTCCTCGCCGCTGCd^ 

tacwaaragcctaaccacccgcgaccStttgggt^ 

MXXCFLGAA'K PKM I G G I GGF1 KVRQY D> 
11130 11140 11150 11160 11170 11180 11190 11200 



CCAAATOtTTATCC^TCTGTGGAMASAAGGCTATcfrcCTAC 
GGTT^AGKAAT AGC7TTTAGAC ACC TK TS TTCC GATAQAGG ATGGT ATCCGA 

CI X1EICGX K Al'sYHRLRDFI L I X A R> 

env 555-584 (173) ""0 11240 11250 11260 11270 1128 ? 

YTGTGGAACTGCTCGGCCRTACC^CCCTGAPAGGCCTCCRGAGAGGfjACACTGAATGCCTGGGTGAAAGTGRTTGAGGAA 
RACACCTTGACGAGCCGGYATCGAGGGACTYTCCGGAGGYCTCTCCdTGTGACTTACGGACCCACTTTCACYAACTCCl*T 

xvellgxsslxglxrg'ti>nawvkvxee> 



""o gag 151-180 (11) 11320 1133 ? n34 ? 11350 11360 

AAGGSATTCARTCCCGAAGTGATTCCCATGTTTWCCGCTCTGTCCGAGGGAGCCACi^ 
TTCCSTAAGTYAGGGCTTCACTAAGGGTACAAAWGGCGAGACAGGCTCCCTCGGTGT[gC 
r» v b c\/T DM FXA LSEGATL 



t 

B7 



AGC AAC AC A SCCGCT A A 
TCGTTGTGTSGGCGATT join 



K X F XPEVIPMPXALSEGATLESNTXAN>C1 

H370 meo nef 46-75(182) »»? ' 1142 ? 1142 ° 1144 ? { 




iGAGCCK 

:tcggm 

A> 



»«o env 630-651 (178) 11490 *P aCerS 1152 ? 

GGAGGG^TATCCTCMACATTCCX'ASGAGGATTACGCAAGGCYTTGAGAGAGCCCTCCT/ gccgcc gaatgggatacgrtt 
CCTrcCGATAGGACKTGTAAGGGrrSCTCCTAATCCGTTCCGRAACTCTCTCGGGAGGAn CGGCGC CTTACCCTATCCYAA 
TT vxDvn TROGXERALLAAEWDRX:- 



xrailxipxrirqgxerall 
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,1570 11580 11590 H«00 




12 no 12180 12190 pol 241-270 (50) 



AGLKKKKSVT 

12280 pol 541-570 (70) 



12220 12230 12240 

XTCCC 
iGGGC 
! P 

12310 12320 
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12490 125G0 12510 12520 gag 1 36-1 65 (1 0) 12550 12560 

* * I * * * * 

GCCGCCARCAGAGAOACAAACCTCGGQCAAAACSyCCAGGGACAGATGGTGCATCACSCTMTTfeGCCCCAGGACCCTCAA 

cggcggtygtctctctgtttcgagccc|gtttt<>srggtccctgtctaccacgtagtcsgakaatcggggtcctgggagtt 
a a x r e tk lg'on xqgqmvhqxxsp rtln> 

12570 12580 12590 12600 12610 enV 61-90 (141) 1264 °. 

CGCTTGGGTCAAGGTCRTCGAACAGAAAGSCTTTAROGAHACCGAAGTGCATAACGTCTGGGCTACCCATGCCTGTGTGC 
GCCAACCCACTTCCAGYAGCTTCTCTTTCSCAAATYdcTKTGGCTTCACGTATTGCAGACGCGATGGGTACGGACACACG 
AWUXVXEEX XFX'XTEVHNVWATH ACV> 

12650 1266C 12670 12680 12690 12700 12710 12720 

* i * 

CTACCGATCCCAATCCCCAAi^GRTTSWCCTO 

GATGGCTAGGGTT AGGGGTTCTC Y AASWGG AC CTC TTAC ACTGTCTGG AGTTC CTAGTC KTTRAGGAGCCGK AAACCCC T 
PTDPN PQEXXLENVTe'lKDQXXLGXWO 

env 375-404 (161) 12750 12750 32770 1278 °. 1279 °. 1280 °. 

TGCTCCGGCAAAMTCATTTGCACAACCRMTGTGCC^TGGAACAGCWCCTGGTCCAAu 

ACGAGGCCGTTTIO«rrAAACCTGTTGGYKACACGGAACCTTGTCGWGGACCAGGrrdGKTKGACCGGTAT^ 

csckxicttxvpwnsxwsm'xxchnxvo 

12810 vif 136-165 (109) 12640 12850 12860 12870 1288 ? 
aagcctccagtatctggctctgamggctctcat™ 

T*7^GGAGGTCATAGACCGAGACTKCCGAGACTAATKCGGATTCTTTTAGTYTGGGGGAGACGGATCOCRATTCTG7~TAGT 
SLOYLALXALIXPKKIXPPLPS'XKTI? 

12890 12900 env 230-254 (152) 12930 ^pacers 12960 J. 

o^^SIgaagwa C2 



A A 



tttcwt join 

A S E X> C3 



ttgtgcajctgaatragtccgtggwaatcaattccacaagccctarcaataacacaaggam;go 

AACACGTAGACTTAYTCAGGCACCV^AGTTAACGTGTTCCGGATYGTTATTGTGTTCCTK'l CG 
I VHLNX SVXI NCTRPXNNTRX|A 

12970 12980 12990 gag 1 06-1 35 (8) 13020 13030 13040 j 

CAGAAWAAGTCCMAACAGAAAACCCACCAAGCCGCCGC^ 
GTCTTWTTCAGGKTTGTCTTTTGGCTCGTTCCGCGGCCGCT^ 
C X KSXOKTOQAAADTGXSSXVSQNYP I> 

13050 13O60 13070 13080 p 0 | 826-855 (89) 13110 13120 

* * * * 

tgtJtccaactttacctccrccrctgtgaa^ 

acacIaggttgaaatggaggtggygacactttcggcgaacaaccacccggyyatagkttgtcctcaaaccttagggaatgt 

1 xXVKAACWWAXIXQEPGI PY> 



T S 



13130 13140 13150 13160 13170 po! 586-615 (73) 13200 

atccccaaagccaaUcattctatgtggatggcgctgccartagggaaaccaaactgggaaaggctcgctatgtgacagac 

TAGGGGTI^CGGTTjTGTAAGATACACCTACCGCGACGGTYATXTCCTTTGGTTTGACCCTTTCCGACCGATACACTGTCTG 

npqsq'tfyvdgaaxretklgkagyvtd> 

13210 13220 13230 13240 13250 pol 766-795 (85) 13280 

♦ • ♦ * * * 

AGAGGCAGACAGAAARTCRT*rAGc|GGAATCTGGCAGCTCGACTGTACCCATCTGGAAGGCAAARTCATTCTGGTAGCCGT 
TCTCCGTCTGTCTT^YAGYAATCqCCTTAGACCGTCGAGCTGACATGGGTAGACCTTCCGTTTYAGTAAGACCATCGGCA 
P G RQKX XSIG I WQ L DCT H L EG K X I L V A V> 

13290 13300 13310 13320 13330 13340 13350 13360 

CCACGTCGCCTCCGGCTACATTGAGCCTGAGGTdGGCAATGAGCAAGTGGATAAGCTCGTGAKTKCCGGAATCAGAAAGG 
GOTGCAGCGGAGGCCGATGTAACTCCGACTCCA^CGTTACTCGTTCACCrATTCGAGCACTMAMGGCCTTAGTCTTTCC 
H VASGY I E A EV»GN EQVDKLVX XG I RK> 

pol 691-720 (80) 13390 13400 13410 13420 13430 13440 

TGCTA7^<rCTCGACGGAATCRATAAGGCTCAGGAAGAGCACGA/|GT^AGGGAAAGGATTAGGCnARCCSCTCCCGCTGCT 

acgataaggagctgccttagytattccgagtccttctcgtgcttjcactccctttcctaatccgytyccsgagggcgacga 

VLFLDGIXKA0EEHEVRERIRXXXPAA> 
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nef 16-45 (180) 13470 i34ro 13490 13500 13510 13520 

GAAGGC6TCGGCGCTG YCTCCCRGGATCTGGATA AGK ACGG AGCCMTCACCTC QAC AAGCGGAA CCCAACAGTCCC AGGG 
CTTCCGCAGCCGCGACRGAGGGYCCTAGACCTATTCMTGCCTCGGKAGTGGAGuTGTTCGCCTTGGGTTC 
E CV-GAXSJCDLDK1CAXTS I TSCTQQS0G> 

13530 rev 91-120 (130) 1356 °. 13570 135B ° 1359 °. 13600 

AACTGAAACTGGCGTCGGOWCCCTCAGATl^rYGGGACAGTCCAGCGYTRTCCTCGGCYCCGGCJTCCATCGTCATCTGGG 
TTGACTTTGACCGCACCCCKYCGGACTCTAAARCCCTCTCAGGTCCCRAYAGG 

TETGVGX PQ1 XGESSXXLGXc'si VI W> 



13610 13620 pol 526-555 (69) 33650 13660 spacers 



GTAAAACCCCTAAGTTTARGCTCCCCATTCAGARAGAGACATGGGAARCCTGGTGGAyGGASTATTGGCAAGCC SCTGCI 

CATTTTGGGGATTCAAATYCGAGGGGTAAGTCTYTCTCTGTACCC^ -CACGA 
GKTPKFXLPIQXETWEXWWXXYWOAAA> 



13690 13700 13710 en V 1 40-1 69 (1 46) 1*740 13V>0 13760 

TACAGACTGATCARCTGTAACACAAGCGYTATCAMACAGGCTa^ 

ATGTCTGACTAGTyGACATTCTGTTCGCRATA»rTKTGTCCGAACGGGATTCYAATSGAAAC7SGGATAGG 
YRLIXCNTSXIXQACPKXXFXPIPIHY>. 

13770 13780 13790 13800 pol 376-405 (59) l 38 ^ 0 13840 T 

' * C3 

^^pTCGATGGGCTATGAGCTCCACCCTGACAGATGGACAGTGCAACCCATC SWGCTCCCCGAAAAGG . . 
f^ACCTACCCGATACTCGAGGTGGGACTGTCTACCTGTCACCTTGGGTAGSWCGAGGGGCTTTTCC JOIfl 
SWMGYELHPDRWTVQPI xlpe k> C4 



CTGTGCCCC 
GACACCGGG 



/ 13850 13860 13870 13880 13890 gag 331-360 (23) 13920 ^ 

/ ASTCCTC gacactcaatcacattcagJaaawcaattctgjuiag^ 

TSAGGACCTGTCACTTACTGTAAGTdTTTWGTTAAGACTYTCGGGAG 

XSWTVNDIQ 7 KXI L.XALGXGAXL. EEMM T> 
13930 13940 13950 13960 13970 13900 13990 14000 

GCATX»TCAG<5GAGTGGGAGGCCCTRGCCATAAGGcJAGAGTGTATTACAGAGACTCCAGCGACCCC^fI^TrcGAAACCCCC 

cgtacagtcc*^accctccggcaycgctattccg/|tctc^ 
acqgvccpxhka'rvyyrdsrdpxwkg p> 

pol 931-960 (96) 14030 1404 ? 14050 14060 14070 14080 
tgccaaactgctctggaaaggcgaaggcgctgtggtcatcc aagagrttaagattggaggccaactgawagaagccctcc 

ACGGTTTG ACGAG ACCTTTCCGCTTCC GC G ACACC AGTAGGTTC To Y AATTCT AACCTCCGGTTG A CTVTTCTTCGGG AGG 

akllwkgegavviod'xkigcqlxeal* 

14090 pol 61-90 (38) 14120 14130 14140 14150 14160 

tggatacaggagccgatgacaccgtcctggaagawatsaatctgcctc^ 
acctatgtcctcggctactctggcaggaccttctotasttag^ 

ldtgaddtvlexxnlpgxw'gikqlqar> 
i4i>o i4i8c env 360-389 (160) * 4210 14220 s pacers 



gtcctggctrtcgagaggtatctgaaagatcaamagyttctgggaotctcgggctgtagcggaaac gctgci atggaaaa 



GTCCTGW. T K J\<»AbAW»i * * " *- * 

CAGGACCGAYACCTCTCCATAGACTTTCTAGTTKTCRAAGACCCTKACACCCCGACATCGCCTTTC CGACG£ 

vlaxerylkdqxxlg'xwgcsgk AA 



tacctttt 

M E N> 



14250 14260 14270 Vlf 1-30 (100) 14300 14310 14320 

* * * 

cagatggcaagtgmtgatcgtctggcaagtggacaggatgargattaggacatggaawagcctcgtgaaacaccatatgy 
gtctaccgttcackactagcacaccgttcacctgtcctactyctaatcctgtaccttwtcgg 

RW QVXI VWQVDRMXIRTWXSLVKHHM> 

14330 34340 14350 14360 en v 390-41 9 (1 62) 14390 14400 
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1 4410 14420 14430 VpU 1 6-45 (133) 14460 14470 14480 

TGGAT K S AATGCjCTG ATTMTCGCTATCGTC GTGTGG ACCATTGYGTWTATCGA ATAC ARGAAACTGCTC ARGCAAAGGAR 
ACCTAMSTTACaGACTAAKAGCGATAGCAGCACACCTGGTAACRCAWATAGCTTATGTYCTTTGACGAGTYCGTTTCCTY 
WXXW*LIXAIVVWTIXXIEYXKLLXQRX> 

14490 14500 14510 14520 gag 46"75 (4) 14550 14560 

« * * # -• * 

AATCGATAGGCTCATCRAAAGGCTCAACCCTCGCCTCCTGGAAACCKCTGAGGGATGTMAACAGATCCTGGRACAGCTCC 
TTAGCTATCCGAGTAGYTr^:dGACTTC«GACCGGAGGACCTTTGG^K»CTCCCTACAKTTGTCTAGGA 

I D R ' L I X.R L N P C L L E T X EG C X Q I L X Q I>> i 

14570 14580 14590 14600 14610 14620 14630 14640 I 

* * * * * C4 

AGYCCGCCCTCMAGACAGGCWCCGAAGAGCTC^^^^iAGAAAGCTCCTGARACAGAGAARGATTGACAGACTGATTRAG . . 

TC RGGCGGG AGKTCTGTCC GWGG CTTC TC G A<^^^^^TCTTTCGAGG ACT YTGTCTCTT YCTAACTGTCTGACTA A Y TC J 01 11 
QXALXTGX EEL.SSRKLLXQRXI DRLIX> C5 

VpU 31-60 (134) 14670 14680 14690 14700 14710 14720 

AG AA Y C AG AGAG AGAGCCGAAG ACTC C G G CAATG AGTCCGAGGG AGAGAC ACCCGG AATCAG ATACC AAT AC AATGTGCT 
TCTTRGTCTCTCTCTCGGCTTCTG AGGC C GTTACTC AG GCTC C CTCTOTCTGCGCCTTAGTCT ATGGTT ATGTTACACGA 
R X R E R A E D S G N E SECD T PGI R Y Q YNV L> 

14730 pol 286-315 (53) 3 4760 14770 14780 14790 14800 

CCCCCAAGCCTGGAAGGCCTCCCCASCCATTTTCCAAAGCTCCATC 
GGGGCTTCCGACCTTCCCGAGGGCTSGGTAAAAGG^TCGAGGTACKGGKTT^^ 

PQGWKGS PX I FQ SSMXX Il'mmQRGNF> 

I '* 

14810 i482p gag 376-405 (26) 16850 14860 i487o i4aeo 

RGGCACMGAAAAGGATTRTCAAGTGCTTCAACTGTGGAAAGGAAGGCCATMTCC 

YCCCTGKCTTTTCCTAAYAGTTC A C GA AGTTG ACACCTTTCCTTCCGGTAKJVGCGATYCTTAACGTCt|gGAGGGGACCTC 
X GX KRI XK C FN CG KEGH XAXNC R ' P PL E> 

14890 .14900 14910 reV 76-1 05 (1 29) 14 9 40 14950 14960 

agactgmacctggattgctccgaggatwgcgrcacctccggcacacagcaaagccaaggcacagagacaggagtgggaIct 
tctg acktgg acctaa cg aggctccta wc gc y gtgg a ggccgtgtgtcgtttcgg ttccgtgtc tctgtcctc accctjg a 
r lxldcs e dx x tsgtqqsqgt etgvg l> 

14970 14980 14990 15000 pol 781-81 0 (86) 1^030 15040 

CGTGX5CTGTGCATGTG<3CCACCGGATATATCGAAGCCGAAGTGATCCCro 

GC ACCGAC ACGfTACAeCGGTCGCCTATATAGCTTCGGCTTCACTAGGGACGGCTTTGACCTGTC CTTTGGC-GAATGAA AK 
VAVHVASGY 1EAEVIPAETGQ ETAYF* 

isoso 15060 iso7o 15080 15090 env 200-229 (1 50) 1S120 
. • » • • 1 ' * 

TCCTC AAOA TTARGCCTGTGGTC AG C AC AC AGCTCCTGCTCAACGGT ACC CTCGCTC AAC ACG AA RTC R TTATC AGAAGC 
AGGAGTTOTAATYCGGACACCAGTCGTGTGTCGAGGACGAGTTGCCATCGGAGCGACTTCTCCTTYAGYAATAGTCTTCG 
x lk'ixpvv STQLLLNGSLAEEE XX I RS> 

15130 15140 15150 15160 15170 pol 406-435 (61 ) 15200 

gaaaacyttaccrataaciaaactggtcggcaaactgaat'tggg 

c ttttgr aatgg yta ttcftttg a cc agc cgtttg actt aacccg aagggttt acatg sg acc gt agtttc a ctycgttg a 
enxtxn'kl>vcklnwasqiyxgikvxql> 

15210 15220 15230 15240 15250 enV 1 21 -1 39 (1 45) 15280 

* • * * * * 

GTGTAAGCTCCTGAGAGGCRCCAAAGcdcTCACCCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCAATG 
CACATTCGAGGACTCTCCGYCGT^TCGcfcAGTGGGGAGACACACACTGTGACTTAACGTGTTTGCGATrGGAGTAGTTAC 
C KLLR GX K A ' L T p'lCVTLNCTN AN L I N> 

spacers 15310 15320 15330 tat 76-102 (123) 15 36o 



TGAA' GCTCC1 CAAMCCAGAGGCGATAACCCTACCGRTCCCRAAGAGTCCAAGAAARAGGTCGMGTCCAAGRCAGAGACA 
ACTT/jCGACG^GTTKGGrCTCCGCTATTGGGATGGCYACGGYTTCTCAGGTTCTTTYTCCAGCKCAGGT'rCYGTCTCTGT 
QXRG DNPTXPXESKKXVXSKXET> 
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spacers 



15390 



15400 



rev 61-90 (128) 



15430 



15440 



GACCCTTKTGAC GCCGCC ^^pjrCCAJlCTKTCTGGGAAGGYCTGCCGAACCCGTCCCCCTCCAGCTCCCCCCTCTGGA 
CTGGGAAMACTC CGGCGC ^^^AGGTKGAHAGACCCTTCCRGACGGCTTGGGCAGCCGGAGGTCGAGGGGGGAGACCT 
AAPSSXXLGRXAEPVPLQLPPL. E> 



15500 



15520 




C5 
join 
C6 



env 450-479 (166) 



15600 



ASTGGCTGTGGTACATTAAGATTTTCATTATGATTG 



TSACCGACACCATGTAATTCTAAAAGTAA 

xwlwyikifimivgg'nkivrmyxpvs i> 



15610 



gag 271-300(19) ls "° 1S6S ? 1566 ? 15 "? 1568 ? 

r-TCGACATTARGCAAGGCCCTAAGGAACCCTTCAGGGATTACGTGGACAGATT^CTAAGCTCCTCT^ 
GAGCTGTAATYCCTTCeGGGATTCCTTGGGAAGTCCCTAATGCA 
LD I XQGPKEP F R D Y V D R F 1 A K LLWKGE G 



156S0 15700 p 0 | 946-975 (97) 



15730 



15740 



15750 



G> 
1S760 



AGCCGTCGTCATTCACGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGCCT^ 
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GAdGAGTl^ACCTTCAAACTGAGGGYGGAGCGGG«CTCTGTATASCGGTCCCTTGACGYAGGG 
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16330 env 270-299 (154) 16360 1637 ? 1638 ? 16390 164 °? 

RAACCCCTCGGCRTTGCCCCTACCARAGCCAAAAGGAGAGTGGTCSAGAGAGAGAAAAGC CTCACCGAWATCGTCMCACT 
YTTGGGG AGCCG Y AAC G GGGATGGT YTC GGTTTTC C TCTC ACC AG S TCTCTCTCTTTTCC GAGTGGCTWTAGCAGKGTGA 
XPL .GXA P T X AKRRVVXRE KRLTXI VX L> 

i6 4 io i642o po! 436-465 (63) 16450 16460 16470 1648 °. 

C AC CG AAG AGGCTG AGC TGG AGCTGGMGG AA AAC AGAG AGATTCTGARGG AACCC GTCC ACGGAGTGTATjAG AGTGCTC G 
GTGGCTTCTCCOAC^GACCTCGACCKCCTTTTGTCTCIW 

TEEAELELXENREI LXEPVHGVY»RVL> 

16490 i65oo 16510 gag 361 -390 (25) 16540 16SS0 16560 

CCGAAGCCATGAGCCAAGYCAMCMATGCCAACATCATGATGCAGAGAGGCAATTTCARAGGCCMAAAGAGAATCRTCA^ 
GGCTTC CGT ACTC GGTTC RGT KG KT ACGGTTGTAGTACTACGTCTCT*CCGTT AAAGT YTCCGG KTTTCTCTT AGYAG TTIJ 
AEAMSOXXXANI.MMORGNFXGXKRIXK> 

16ST0 16580 16590 16600 nef 61 -90 (183) 16630 16640 

CAAGAGGAAGAGGRGGTCGGCTTCCCCGTCAGGCCTCAGGTCCCACTGAGACCTATGACCTACAAAGSAGCCRTCGATCT 
GTTCTCCTTCTCCYCCAGCCGAAGGGGCAGTCCGGAGTCCAGGGTGACTCTGGATACTGGATGTTTCSTCGGYAGCTAGA 

qeeexvgf pvrpqvp lrpmtykxaxd l> 

16650 16660 16670 16680 16690 ga g 286-31 5 (20) 1^720 

GTCCYTCTTOARACAGGCACCCAAAaAGCCTTTCAGAGACTATGTGGATAGGTTTT^ 

CAGGRAGAAdr^TGTCCCTGGGTTTCTCGGAAJVGTCTCTGATACACC^ATCCAAAAWG'i l I iX^GGAGTCCCGACTCGTTC 
SXF*XQG PK EPPRDYVDRF X K T L. R A E Q> 

16730 16740 16750 16760 16770 gag 16-45 (2) 16800 

CCWCACAGGAWGTGAAAAAaTGGGAGAAAATCAGACTGAGACCTGGTGGCAAAAAGAAATACARAMTGAAACACMTTGTG 
GG WGTGTCC TWC A CTTTTTG AC C CTCTTTT AGTCTG ACTCTGCACC ACCGTTTTTCTTT ATGT YTKA CTTTGTG KAAC AC 

axoxvkn'weki rlrpggkkkyxxkhxv> 
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^^^^CGAGGTCCCTTGACCTTTCCAAACGdAGGGTCATACGGGAGCCGTACTAGGWTCGGGTTGGGCTATYCAG^CTCAG 
WAS R E L E RFA'S QYALG I I XAQP DX SE S> 
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CG AGSTCGTGARTCAGATTATCGAAVAGCTC ATCAAGAA dATTGCCGTC GCCG RA K GGACAGACAGARTCATTGAGGTCG 
GCTCSAGCACTYAGTCTTAATAGCTrBTCGAGTAGTTCTTC|TAACGGCAGCGGCYTMCCTGTCTGTCTYAGTAACTC 

EX VXQIIEXLIKK , IAVAXXTDRX IEV> . 

env 615-644 (177) 16990 17000 17010 1702 °. 17030 1704 ? ' 

YCCAAATOGCTKGGAGAGCCATTCTGMATATCCCCASGAGAATCAGACA<^^aCTCGCCGGAAGGTGGCCCGTCARG CJ 

RGGTTTCCCGAMCCTCTCXXn-AAGACKTATAGCGGTSCTCTTAGTCTO J 0,n 
XORAXRAII. XIPXRIRQTRLAGRWPVX> C8 

^oso pol 811-840 (88) 17080 17090 17100 17110 1?12 ? { 

RYAATCCATACCGATAACGGAACCAATTTCACAAGCRCTRCW 

Y RTTAGGTATGGCTATTGCCTTCG VTAAAGTGTTCGYGA YGGC AGTTCCGACGGACG ACC ACCCG/1CTACACTYTGTCGA 

xihtdngsnftsxxvkaacwwa'dvxql> 
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caccgmagycgtccagaaartcgctaccgaaagcattctgatatggggaaagacacCcaagttcaractgcctatc gctc 
gtggcktcrgcaggtctttyagcgatggctttcgtaacactatacccctttctgtgggttcaagtytgacggatac ccac 
txxvqkxatesiviwgktpkfx lpi I a> 
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v TC CTTTCC AC AA ATC AC ACTGTGG C A A AGG CCTCTGGTC A CCGA AC CCTTC AG A A W^^G AATCCC G A W ATG G TG AT 

J x x F P Q I T L W Q ~R P L V TE P F R X X N P X M V 1> 

330 340 350 360 370 380 390 400 

TTACCAGTACATCGACGATCTGTATCT^GA^ 
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r Scramble 7 

r Includes 7 

tfinclude <stdio.h> 
include <stdl»b.h> 
#include <string.h> 
#include <time.h> 

r Constant definitions 7 

I* Version Information 7 
#define VERSION_NO 
#def]ne VERSION_DATE 

r Misc */ 

#define KEYBOARD_BUFFER_SIZE 
#define LEN_CODON 
null) 7 

#define BUFFER SIZE 
#define TRUE 
tfdefine FALSE 

/* Error codes 7 

#define E_NOERROR 

#define E_NOINFILE 

#define E_MALLOC 

#define E_FILEREAD 

#define E_CREATE_OUTPUT_FILE 

#defmeE OVERLAP 



/* Structure definitions 7 

typedef struct gene GENE; 

typedef GENE * P GENE; 

typedef struct gene_segment GENE_SEGMENT; 

typedef GENE_SEGMENT * P_GENE_SEGMENT; 

struct gene { 

char * name; 
char * data; 
PJ3ENE next_gene; 
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H 02 .. 

"04/03/1999" 
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2 
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/*size of keyboard read buffer 7 

Tlength of codon (including 

1 0000 /'size of file read buffer 7 
1 Tboolean true 7 

0 /'boolean false 7 



/*no error 7 

/*genes file not found 7 
/'memory allocation error */ 
Tfile read error 7 
/*error creating output file 7 

/•segment overlap >= length 



struct gene__segment { 

P_GENE p_gene; 
int number; 
int offset; 

int first_codon_choice; 
char * amino_data; 
char * dna_data; 
P_GENE_SEGMENT next_seg; 

}; 
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int prologQ; 

int get_parameters(); 

int read_int(char * prompt); 

int load_genes(); 

int add_gene(char * gene_name,char * gene_data); 

void insert_gene(Pj3ENE * head,P_GENE new_gene); 

int add_aa(); 

int split_genes(); 

int split_gene(PJ3ENE g); 

int insert_segment(P_GENE_SEGMENT * head_seg,PJ3ENE_SEGMENT new_seg); 
int convert_segments_aa_to_dna(); 

int convert_aa_to_dna(char * aa_ptr,char * dna_ptr,int first_choice); 

char * codon(char acid_char,int preferred); 

int perform_scramble(); 

int scramble_segments(); 

int adjacent_segments(); 

int display_genes(); 

int write_output_file(); 

void strip_newltne(char * strip_str); 

void pad_amino_string(char * amino_ptr, char * padded_ptr); 

int even(int test_num); 

void read_str(char * prompt.char * string); 

char * read_nonblankJine(char * buf.int buf__size,FILE * in_file); 

int user_confirmation(); 

void test(); 

r Global variables 7 1 

char * codon Jable[26][2] = { 
r A 00 V rGCC w , w GCT w }, 

r - 01 */ {"???v??? ,, } > 

r C 02 7 {'TGC^-TGT"}, 
r D 03 V {-GACVGAT"}, 
r E 04 7 f'GAG VGAA"}, 

r f 05 7 {nrcvTTT*}, 

/* G 06 7 f GGCVGGA"}, 
r H07 7 fCACV'CAT"}, 
r I 08 7 rATC-.-ATT}, 

r - 09 7 {"???","???"}, 

/*K10 7{7VAGYAAA n }, 
/* L 11 V rCTGVCTC"}. 
/*M 12 7rATG7'ATG"}, 
/*N 13 7 {"AACVAAT"}, 
/* - 14 */ {*???Y??? w } f 
f*P 15*/rCCCVCCT"}, 
/*Q16 7 rCAG"rCAA"}, 
r R 17 7 rAGG-.-AGA"}, 
/*S 18V{"AGCVTCC"}, 
r T 19 V{"ACC","ACA"}, 
/* - 20 7 {"???","???"}, 
/* V21 */ {"GTGVGTC"), 
/* W 22 7 { W TGG H ,TGG W ), 
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/* - 23 */ {"???","???"}, 

r Y 24 V {TAC-ZTAT'}, 97/2 1 6 

/* - 25 */ {"???" 

}; 

char * error textn = { 
/* 00 7 " 

r 01 7 ."ERROR: Input file not found!" 

r 02 */ ."ERROR: Memory allocation error" 

I* 03 7 ."ERROR: File read error" 

r 04 7 ."ERROR: Could not create output' file" 

/* 05 7 ."ERROR: Segment overlap must be less than segment length" 

}; 

char disease_name[KEYBOARD_BUFFER_SIZE]; 
char input_file_name[KEYBOARD_BUFFER_SIZE]; 
char output_f]le_name[KEYBOARD_BUFFER_SIZE]; 
int num_genes = 0; 
int num_segments = 0; 
int len_segment; 
int segment_overlap; 
P_GENE first_gene = NULL; 
PJ3ENE_SEGMENTfirst_segment = NULL; 
P_GENE__SEGMENT * scrambled_segments = NULL; 

1* Mainline 7 

void main() { 



int error = E_NO ERROR; 

printf("Scramble - Version %s. %s\n\n",VERSION_NO,VERSIONJDATE); 

r Initial processing */ 
if (terror) 

error = prologQ; 

r Get various program parameters from user 7 
if (lerror) 

error = get_parameters(); 

r Load genes from genes file */ 
if (ierror) 

error = load_genes(); 

r Add TVA' to start and end of all genes 7 
if (lerror) 

error = add_aa(); 

I* Split genes into overlapping chunks */ 
if (lerror) 

error = split_genes(); 

/* Convert segment amino acid to dna 7 
if (lerror) 

error = convert_segments_aa_to_dna{); 
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r Scramble the segments 7 98/216 
if (lerror) 

error = perform_scramble(); 

I* Write output file */ 
if (lerror) 

error = write_output_file(); 

/* Show error if there was one 7 
if (error) 

printf( M %s^error_text[error]); 



r prologO V 

/* Perform any initial processing required 7 



int prolog() { 



/* Seed the random number generator, using the system clock 7 
f Don't run the program more than once in the same second! 7 
I* Or we'll get the same randomisation!!!!!!!!!!!!!!!!!!!!! 7 
srand(time(NULL)); 

return E NOERROR; 



} 



* get_parameters() 7 
Ask for various parameters from the user (stdin) 7 
Disease name 7 
Input file name 7 
Output file name 7 
Segment length 7 

Int get _parameters() { 
int valid; 

read_str("Enter disease name : w ,disease_name); 
read_str( w Enter input file name : n ,input_file_name); 
read_str("Enter output file name : '\output_file_name); 

valid = FALSE; 
while (!valid) { 

len_segment = readjntf Enter segment length : "); 

if (len_segment % 2) 

printffSegment length must be even!\n w ); 

else 

valid = TRUE; 
segment_overlap = Jen_segment / 2; 
return E^NOERROR; 

} 

/* load_genes() 7 
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/* Load the genes from the input file 7 99/216 

int load_genes() { 

FILE * input Jile; 
char nameJjuf[BUFFER_SIZE]; 
char data_buf[BUFFER_SIZE]; 
int rc; 

I* Open genes file for reading 7 
if (NULL == (inpuMile = fopen(input_file_name, V))) 
return E_NO!NFILE; 

printff Loading genes from: %s\n",input_file_name); 

num_genes = 0; 

t* Read gene name 7 

while (NULL != read_nonblank_line(name_buf,BUFFER_SI2E,input_file)) { 
r Read the gene data 7 

if (NULL != read_nonblankJine(dataJ>uf3UFFER_SIZE jnputjile)) { 
/* Allocate memory for new gene and add to list 7 
if (rc = add_gene(name_buf,data_buf)) 
break; 

} 

} 

/* Close genes file 7 
fclose(input_file); 

return rc; 

} 

r add_gene() V 

/* Allocate memory for new gene, then insert in list */ 

int add_gene(char * gene_name,char * gene_data) { 
P_GENE new_gene; 

/* Allocate storage for new gene *l 

if (NULL == (new_gene = malloc(sizeof(GENE)))) 

return E_MALLOC; 
I* Initialise new gene 7 
new_gene->nextgene = NULL; 
/* Allocate storage for gene name (+1 for null) */ 
if (NULL == (new_gene->name = malloc(strlen(gene_name)+1))) 

return E_MALLOC; 
/* Store gene name 7 
strcpy(new_gene->name,gene_name); 
r Allocate storage for gene data (+1 for null) 7 
if (NULL == (new_gene->data = malloc(strten(gene_data)+1))) 

return E MALLOC; 
r Store gene data 7 
strcpy(new_gene->data,gene_data); 
I* Insert the new gene into linked list 7 
insert_gene(&first_gene,new_gene); 
r Increment num_genes 7 
num_genes++; 
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return E_NOERROR; 
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l* insert_gene() 7 

/* Insert gene into linked list */ 

void insert_gene(P_GENE * head_gene,P_GENE new_gene) { 
P_GENE * cur_j>lr = head_gene; 



/* add_aa() 7 

/* Add *AA' to the start and end of every gene 7 

int add_aa() { 

P_GENE cur_gene = first_gene; 
char * new_data; 



while (NULL != cur_gene) { 

/* Allocate storage to fit the gene plus four characters 7 

new_data = malloc(strlen(cur_gene->data)+5); 

I* Shift gene data to new storage, add "AA" 7 

strcpyfnew^datarAA"); 

strcat(new_data,cur_gene->data); 

strcatinew^data/'AA"); 

/* Free previous gene data storage 7 

free(cur_gene->data); 

T Set gene data pointer to new storage 7 

cur_gene->data = new_data; 

/* Advance to next gene */ 

cur_gene = cur_gene->next_gene; 



/* split_genes() 7 

I* Split the genes into overlapping segments 7 

int split_genes() { 

P_GENE cur_gene = first_gene; 
P_GENE_SEGMENT cur_seg = first_segment; 



while (NULL != (*curj>tr)) 

cur _ptr = &((*cur_ptr)->nexLgene); 



*cur _ptr = new_gene; 




return E_NO ERROR; 



printf("Splitting genes into segments... \n"); 



T Split the genes into segments 7 
while (NULL != cur_gene) { 



/* Split the gene 7 
split_gene(cur_gene); 
/* Advance to next gene 7 
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cur_gene = cur_gene->next_gene; 

} 

/* Count the number of segments */ 
num segments - 0; 
curseg = first_segment; 
while (NULL != cur_seg) { 

num_segments++; 

cur_seg = cur_seg->next_seg; 

} 

return E_NOERROR; 

} 

r split_gene() 7 

r Split a gene into overlapping segments 7 

int split_gene(P_GENE g) { 
char * seg_ptr; 
char * seg_buf; 

P_G EN E_S EG M E NT new_segment = NULL; 

int done; 

int seg_ctr = 0; 

/* Allocate memory for segment buffer 7 
if (NULL == (seg_buf = malloc(»en_segment+1))) 
return E_M ALLOC; 

I* Insert a null at the end of the segment buffer. 7 
/* so we can use it as a string 7 
seg_buf[len_segment] = 'VO'; 

r Set segment pointer to start of gene data 7 
seg_ptr = g->data; 

done = FALSE; 
while (!(done)) { 

/* So we know if we copied data 7 

seg_buf[0] = ^0'; 

/* Copy a segment of gene data to the segment buffer 7 
memcpy(seg_buf,seg_ptr,len_segment); 

/* If there was some gene data copied to the buffer 7 
if (NULL != seg_buf(0]) { 

r Allocate storage for a new segment 7 

if (NULL == (new_segment = malloc(sizeof(GENE_SEGMENT)))) 

return E MALLOC; 
I* Increment segment counter 7 
seg_ctr++; 

/* Setup the new segment 7 
new_segment->p_gene = g; 
new_segment->number = seg_ctr; 
new_segment->offset - seg_ptr - g->data + 1; 1 
new_segment->next_seg = NULL; 
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if (NULL == (new_segment->amino_data = malloc(len_segment+1))) 

return E_M ALLOC; 
if (NULL == (new_segment->dna_data = malloc(len_segment*3+1») 

return E_M ALLOC; 
new_segment->amino_data[0] = W; 
new_segment->dna_data[0] = *\0'; 
/* Copy segment data from buffer to new segment */ 
strcpy(new_segment->amino_data,seg_buf); 
/* Insert new segment into chain from gene 7 
insert_segment(&first_segment,new_segment); 

} 

/* If we didn't read a full segment, we are finished! 7 
if (strien(seg_buf) < len_segment) 
done = TRUE; 

/* Otherwise, advance segment pointer to next segment in buffer 7 
else 

seg_ptr = seg_ptr + len_segment - segment_overlap; 

} 

} 

/* insert_segment() */ 

/* Insert a segment node at the end of the list 7 

int insert_segment(P_GENE_SEGMENT * head_seg,P_GENE_SEGMENT new_seg) { 
P_GENE_SEGMENT * cur _ptr = head_seg; 

while (NULL != (*cur _ptr)) 

cur_ptr = &((*cur _ptr)->next_seg); 

*cur_ptr = new_seg; 

} 

/* convert_segments_aa_tp_dna 7 

/* Go thru segments, and for each, convert amino acids to dna 7 

int convert_segments_aa_to_dna() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int first_choice = 1 ; 
int alternate; 

printffConverting to DNA..An"); 

r Work out if we need to alternate the first codon choice or not 7 
/* Don't need to do this anymore, since the segment length is 7 
/* forced to be even, and the overlap is half the length (odd). 7 
/'alternate = ((even(len_segment) && even(segment_overlap)) 

|| (!even(len_segment) && !even(segment_overlap)));7 

alternate = FALSE; 

while (NULL != cur_seg) { 

cur_seg->first_codon_choice = first_choice; 
convert_aa_to_dna(cur_seg->arnino_data,cur_seg->dna_data, 

cur_se§->first_codon_choice); 
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/* Address next segment */ 
cur_seg = cur_seg->next_seg; 

F If we are alternating, alternate the first codon choice */ 
/*if (alternate) 

if (1 == first_choice) 

first_choice = 2; 

else 

flrst_choice = 1 ;7 

} 

return E_NOERROR; 

} 

/* convert_aa_to_dna 7 

I* Converts a string of amino acid to dna V 

/* NOTE: assumes that buffer at dna_ptr is large enough to hold dna!!! */ 

int con vert_aa_to_dna (char * aa_ptr,char * dna_ptr,int first_choice) { 
char * p_codon; 
int cur_p referred = first_choice; 

while OD' != *aa_ptr) { 

p_codon = codon(*aa_ptr,cur_preferred); 
strcat(dna_ptr,p_codon); 
^ 7* If we didn't find a codon, log a warning */ 
if (0 == strcmp(p_codon,-???\(T)) 

printffWARNING: no codon found for amino acid!\n M ); 

F Alternate current preferred codon 7 
if (1 == cur_pref erred) 

cur_p referred = 2; 



} 



) 

return E NOERROR; 



cur_preferred = 1 ; 

aa_ptr++; 



F codon */ 

F Returns a pointer to a codon corresponding to the amino acid passed V 
/* The codon pointer is to 3 characters, plus a terminating null */ 

char * codon(char acid_char,int preferred) { 
int codon_table_index; 
char * codon_ptr, 

/* Determine index into codon_table (table starts at 'A') V 
codon_table_index = acid_char - 'A*; 

F Set pointer to appropriate codon */ 

codon_ptr = codon_table[codon_table_index][preferred-1]; 
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return codon_ptr; 

} 

/* display_genes() */ 

r Display the name and data for all genes */ 

int display_genes() { 

P_GENE cur_gene - first_gene; 

while (NULL != cur_gene) { 

printff%s\n w p cur_gene->name); 
printf("%s\n , \cur_gene->data}; 
cur_gene = cur__gene->next_gene; 

} 

return EJMOERROR; 

} 

r perform_scramble() */ 
I* Scramble the segments */ 

/* Check for adjacent segments. If there are, rescramble */ 

int perform_scramble() { 

int done = FALSE; 
int rc = E_NOERROR; 

while (TRUE) { 

rc = scramble__segments(); 
if (EJMOERROR == rc) 

if (adjacent_segments()) { 

printffAdjacent segments detected! Rescramble? (y/n) "); 
if (!user_confirmation()) { 

printffWARNING: Adjacent segments in output 

file.\n"); 

break; 

} 

} 

else 

break; 

else 

break; 

} 

return rc; 

} 

/* scramble_segments() 7 

/* Randomly scramble the segments, putting pointers in scrambled^segmentsQ V 

int scramble_segments() { 

P_GENE_SEGMENT cur_seg = first_segment; 
int i,j; 

P GENE SEGMENT temp; 
- -1 

printf ("Scrambling segments.. An"); 
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I* Allocate storage for array of segment pointers */ 

if (NULL == (scrambled_segments = malloc(sizeof(P_GENE_SEGMENT)*num_segments))) 
return E_MALLOC; 

/* First, initialise scrambled_segments in same order as linked list 7 
i = 0; 

while (cur_seg != NULL) { 

scrambledsegmentsp] = cur seg; 
cur_seg = cur_seg->next_seg; 

} 

/* Now, randomly scramble the segments 7 
for (i=0;i<num_segments;i++) { 

j = rand() % num_segments; 

temp = scrambled_segments[i]; 

scrambled_segments[i] = scram bled_segments[j]; 

scrambledsegments[j] = temp; 

} 

return E_NOERROR; 

} 

r adjacentsegmentsf) */ 

/* Determine if the scrambled segment order has resulted in 7 
/* two segments which were adjacent originally (ie every 7 
r second one) have ended up adjacent. 7 

int adjacent_segments() { 
int i; 

int rc = 0; 

P_GENE_SEGMENT cur_seg; 
P_GENE_SEGMENT next_seg; 

for (i=0;i<num_segments-1;i++) { 

/* Address current and next segments 7 
cur_seg = scrambled_segments[i]; 
nextseg = scrambled_segments[i+1]; 

/* Do segments come from same gene, and are two apart? 7 
if (((cur_seg->p_gene == next_seg->p_gene) 

&& ((cur_seg->number == (next_seg->number)+2) 

II (cur_seg->n umber == (next_seg->number)-2)))) 

return 1; 

} 

return 0; 

} 

/* write_output_file() 7 

/* Write out segments (in initial non-scrambled order) 7 
/* Write out synthetic protein (in scrambled order) 7 
/* Write out synthetic dna (in scrambled order) 7 

int write_output_file() { 1 
FILE * output_file; 
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char * amino_buffer; 
P_GENE_SEGMENT cur_seg; 
int i; 

r Open output file for writing (erase any contents) V 
if (NULL == (output file = fopen(output_file_name, M w"))) 
return E_CREATE_OUTPUTFILE; 

r Allocate memory for padded amino string buffer 7 

if (NULL == (amino_buff er = malloc(len_segment*3+1))) 

return EMALLUU; ~ 

printfp/Vriting output file: %s\n*\outputJile_name); 

/* Write output file header information */ 

fprintf(outpuUile/*Scramble %s - Output File\n",VERSION_NO); 
fprintf(output_file,'Vr); 

fprintf(output_file, w Disease name : %s\n",disease_name); 
fprintf(output_filer , lnput filename : %s\n n ,input_file_name); 
fprintf(output_file,"Output filename : o/osW.outputJile^ame); 
fprintf(output_file ( w Number genes : %d\n",num_genes); 
fprintftoutpuLfllerNumber segments : %d\n w ,num_segments); 
fprintf(output_file, ,, Segment length : %d\n H t len_segment); 
fprintf(output_filerSegment overlap : %d\n".segment_overlap); 

r Write out segments in initial non-scrambled order */ 
fprintf(outputJile i "\rr); 

fprintf (output_file,"Segments in original order:\n M ); 

fprintf(outputJile," Vn"); 

curseg = first_segment; 
while (NULL != cur_seg) { 

/* Format amino data to line up with codons */ 

pad_amino_string(cur_seg->amino_data,aminoJxjffer); 

fprintf(output_file, H Gene : %s\n\cur_seg->p_gene->name); 

fprintf(output_file/*Segment# : %d\n",cur_seg->number); 

fprintf(output_flle,"Offset : %dW\cur_seg->offset); 

fprintf(output_file,"1st Codon : %d\n",cur_seg->first_codon_choice); 

fprintfioutpuLfile.^/oS^^amino^buffer); 

f printf(output_f i!e t "%s\n , \cur_seg->dna_data); 

fprintf(output_file, , Vr); 

cur_seg - cur_seg->next_seg; 

} 

I* Write out segment names in scrambled order */ 
fprintf(output_Jile,"Segments in scrambled order:\n M ); 

fprintf(output_fite," VT); 

for<i=0;i<num_segments;i++) { 

/* Format amino data to line up with codons */ 

pad_amino_string(scrambled_segments(i]->amino_data,amino_buffer); 
/* Write segment details */ 

fprintf(output_file; ,0 /oS #%d\n M ,scrambled_segments[i]->p_gene->name, 

scrambled_segments[i]->number); 
fprintftoutpu^file.^/osXn-^amino^buffer); -1 
fprintf(output_file,' , %s\n rt ,scrambled_segments[i]->dna_data); 
fprintf(output_flle i w \n ,, ); 



Figure 25 (Cont) 



WO 01/090197 PCT/AU01/00622 



107/216 



} 



} 

r Write synthetic protein in one long string */ 
fprintf(output_fi!e,"Synthetic Protein:\n w ); 

fprintf(output_file t " \n"); 

for (i=0;i<num segments;i++) 

fprintf(output__file,"%s",scrambled_segments[i]->amino_data); 

fprintfXoutpuLfile.'ViVr); 

/* Write synthetic dna in one long string */ 
fprintf(output_file. w Synthetic DNA:Vf); 

fprintf(output_file" \n n ); 

for (i=0;i<num_segments;i++) 

fprintf(output_file ? "%s ,T ,scrambled_segments[i]->dna_data); 

return E_NOERROR; 



/* strip_newline() */ 

/* Replace the first newline character with a null 7 

void strip_newline(char * strip_str) { 
char * newline_j>os; 

/* Find the newline char 7 
newline^pos = strch^(str^p_str,^n , ); 

/* If we found one. replace it with a null */ 
if (NULL != newtine_pos) 

newline_pos[0] = W; 

} 

/* pad_amino_string 7 

/* Copy amino chars from amino_ptr to padded_ptr, padding each */ 
/* side with a space. */ 

void pad_amino_string(char * amino_ptr, char * padded_ptr) { 

while 00* 1= *amino_ptr) { 

*padded_ptr = ' *; 

padded_ptr++; 

*padded_ptr = *amino __ptr; 

padded_ptr++; 

*padded_ptr = ' 

padded_ptr++; 

amino_ptr++; 

} 

r Stick a null at the end of the padded string 7 
*padded_ptr = \0\ 



r even() 7 

r True if test num is even, otherwise false 7 
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int even(int test_num) { 

return !(test_num % 2); 

} 

/* read_int() V 

I* Read an integer from stdin. Keep trying until valid int > 0 entered. */ 
/* Return the integer read, or 0 if error reading from stdin. */ 

int read Jnt(char * prompt) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 
int value_read; 
int valid = FALSE; 

while (Ivalid) { 

printf( M %s , \prompt); 
valid = TRUE; 

fgets(buffer,KEYBOARD_BUFFER_SIZE,stdin); 
if (1 != sscanf(buffer/ , %d H t &value_read)) 

valid = FALSE; 
if (valid && (value_read < 1 )) 

valid = FALSE; 

if (fvalid) 

printff Positive integer value please!\n"); 

} 

return value_read; 

} 

/* read_str() V 

/* Read a string from the user (stdin) V 
/* Strip the newline from it */ 

void read_str(char * prompt.char * string) { 

char buffer[KEYBOARD_BUFFER_SIZE]; 

printf(prompt); 

fgets(buffer,KEYBOARD_BUFFER_SIZE,stdin); 
sscanf{buffer, M %s'\string); 



/* read_nonblank_line() */ 

/* Read a line from file until we get a non-blank one V 

char * read_nonblankJine(char * buf.int buf_size,FILE * in_file) { 
char * return _ptr; 

/* Read lines until we get a non-black one, or EOF */ 
do 

return _ptr = fgets(buf f buf_size,in_fife); 
while ((NULL != return j)iv) && (On' == buf[0]) || (' ' == buf[0]))); 

/* If we got a line, change the newline char to a null */ 1 
if (NULL != return _ptr) 

strip_newline(buf); 
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return retum_ptr; 



I* user_confirmation() */ 

t* Read input from user. If user types 'y\ return 1, otherwise 0 V 

int user_confirmation() { 

char buffer[KEYBOARD_BUFFER_SIZE]; 



fgets(buffer,KEYBOARD_BUFFER_SIZE.stdin); 
if ((V == buffer[0]) || ('Y* == bufferfO])) 



/* test() 7 

/* For debugging/development 7 
void test() { 



printf("Enter something: 
fgets(str,100 f stdin); 
printf("line1\n"); 
printf("%s w ;str); 
printf( n line2\n"); 
fgets(str,100.stdin); 



return 1; 



else 



return 0; 



char str[100]; 
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HepC la consensus polyprotein sequence used for scramble program 

MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGRRQPIPKARRPEGRTWAQ 
PGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKVIDTLTCGFADLMGYIPLVGAPLGGAARALAHGVR 
VLEDGVNYATGNLPGCSFSIFLLALLSCLTVPASAYQVRNSTGLYHVTNDCPNSSIVYEAADAILHTPGCVPCVREGN 
ASRCWVAMTPTVATRDGKLPATQLRRHIDLXjVGSATLCSALYVGDLCGSVFLVGQLFTFSPRRHWTTQGCNCSIYPGH 
I TGHRMAWDMMMNWS PTAALVMAQLLRI PQAI LDMI AGAHWGVLAG I A YFSMVGNWAKVLVVLLLFAGVDAETHVTGG 

nagrttsglvslltpgaxqniqlintngswhinstalncneslntgwlaglfyqhkfnssgcperlascrrltdfdqg 

wgpisyangsgpdqrpycwhyppkpcgivpaksvcgpvycftpspvwgttdrsgaptyswgandtdvfvlnntrppl 

gnwfgctwmnstgftkvcgappcviggagnntlhcptdcfrkhpeatysrggsgpwitprclvdypyrlwhypctiny 

tifkvrmyvggv^hrleaacnwtrgercdledrdrselsplllsttqwq 

ygvgssiaswaikweyvvxlfllladarvcsclwmmllisqaeaalenlv^ 

rwvpgawalygmwpllllllalpqrayaldtevaascggwlvglmaltlspyykryiswcl^ 

vwvpplnvrggrdav i llmc vvhptlvfd itklllavfgplw i lqasllkvp yfvr vqgllr i c alar km iggh yvqm 

all klgaltgtyvynhltplrdwahnglrdlavavepvvfsqmetkl i twgadtaacgd 1 1 ngl pvs arrgre i llg p 

adgmvskgwrllapitayaqqtrgllgciitsltgrdknqvegevqivstaaqtflatcingvcwtvyhgagtrtias 

pkgpviqmytnvdqdlvgwpapqgsrsltpctcgssdlylvtrhadvipvrrrgdsrgsllsprpisylkgssggpll 

cpaghavgifraavctrgvakavdfipvenlettmrspvftdnssppavpqsfqvahlhaptgsgkstkvpaayaaqg 

ykvlvlnpsvaatlgfgaymskahgidpnirtgvrtittgspitystygkfi-iadggcsggaydii icdechstdatsi 

lgigtvldqaetagarlvvlatatppgsvtvphpnieevalsttgeipfygkaiplevikggrhlifchskkk^dela 

aklvalginavayyrgldvsviptsgdvvwatdalmtgytgdfdsviix:ntcvtqtvdfsldptftietttlpqdav 

s r tqrrg rtgrg k pg i y r f vapg e r p sgm fds s vl cec yd ag c a w y e lt p ae ttvrlra ymnt pg l p vcqdhle f weg 

vftglthidahflsqtkqsgenfpylvayqatvcaraqapppswdqmwkclirlkptlhgptpllyrlgavqnevtlt 

hpvtkyimtcmsadlewtstwviivggvliaalaayclstgcvvivgrivlsgkpaiipdrevlyrefdemeecsqhlp 

yieqgmmlaeqfkqkalgllqtasrqaeviapavqtnvk)klevfwakh^ 

aavtsplttsqtllfnilggwvaaql^pgaatafvgaglagaaigsvglgkvlvdilagygagvagalvafkimsge 

vpstedlvnllpailspgalwgwcaailrrhvgpgegavqwmrliafasrgnhvspthyvpesdaaarvtailss 

ltvtqllrrlhqwissecttpcsgswlrdivtowicevlsdfktwlkaklmpqlpgipfvscqrgykgvwrgtc 

chcg ae i tg hvkngtmr i vg prtcrnmw s gt f p i n a yttg p ct p l p apn ytf alw r vs ae e yve i rr vg d fh y vtgmt 

tdnlkcpcqvpspeffteldgvrlhrfappckpllreevsfrvglheypvgsqlpcepepdvavltsmltdpshitae 

aagrrlajtcsppsmasssasqlsapslkatctanhdspdaelieanllwrqemggnitrv^ 

edereisvpaeilrksrrfaqal.pvwarpdynpplvetwkkpdyeppwhgcplppprsppvppprkkrtwltestl 

STALAELATKS FGSSSTSGITGDNTTTSSEPAPSGCPPDSDAESYSSMPPLEGEPGDPDLSDGSWSTVSSEAGTEDVV 
CCSMSYSWTGALVTPCAAEEQKLPINALSNSLLRHHNLVYSTTSRSACQRQKKVTFDRLQVLDSHYQDVLKEVKAAAS 
KVKANLLSVEEACSLTPPHSAKSKFX3YGAKDVRCHARKAVAHINSVWKDLLEDSVTPIDTTIMAKNEVFC 
KPARLIVFPDLGVRVCEKMALYDV^SKLPLAVMGSSYGFQYSPGQRVEFLVQAWKSKOT 

IRTEEAIYQCCDLDPQARVAIKSLTERLYVGGPLTNSRGENCGYRRCRASGVLTTSCGNTLTCYIKARAACRAAGLQD 
CTMLVCGDDLVVICESAGVQEDAASLRAFTEAMTRYSAPPGDPPQPEYDLELITSCSSNVSVAHDGAGKRVYYLTRDP 
TTPLARAAWETARHTPVNS WLGN 1 1 MFAPTL W ARM I LMTHFFS VL I ARDQLEQALDCEI YGACYS I EPLDLPP 1 1 QRL 
HGLSAFSLHSYSPGEINRVAACLRKLGVPPLRAWRHRARSVRARLLARGGRAAICGKYLFNWAVRTKLKLTPIAAAGR 
LDLSGWFTAGYSGGDIYHSVSHARPRWFWFCLLLLAAGVGIYLLPNR 



Scramble - Output File 

Scramble version : 0.1 beta, 08/02/1999 
Num . genes : l 

Num. segments : 201 
Segment length : 30 
Segment overlap : IS 

Segments in original order: 



Gene 
Segments 
Offset 
1st Codon 
A A M 



HepCla 
1 
1 
1 

S T N P 



P Q 



N T N 



Q D 



F P G G G 



GCCGCTATGTCCACCAATCCCAAACCCCAAAGGAAAACCAAAAGGAATACC 
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Gene : HepCla 

Segments : 2 
Offset : 16 
1st Codon : 1 

NTNRRPQDVKFPGGGQ IVGGVYLLPRRGPR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGCCAAATC^ 

Gene : HepCla 

Segments -. 3 
Offset : 31 
1st Codon : 1 

QIVGGVYLLPRRGPRLGVRATRKT SERSQP 
CAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGCTACCAGAAAGACAAGCGA 

Gene : HepCla 

Segments : 4 
Offset : 46 
1st Codon : 1 

LGVRATRKTSERSQPRGRRQPI PKARRPEG 
CTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCCAGAGGCAGAAC^CAACCCATC 

Gene : HepCla 

Segments : 5 
Offset : 61 
1st Codon : 1 

RGRRQPI PKARRPEGRTWAQPGYPWPLYGN 
AGGGGAAGGAGACAGCCTATCCCTAAGGCTAGGAGACCCGAAGGCAGAACCTGGGCCCAACCCGGATACCCTTGGCCTCTG 

Gene : HepCla 

segments : 6 
Offset : 76 
1st Codon : 1 

RTWAQPGY PWPLYGNEGCGWAGWLLSPRGS 
AGGACATGGGCTCAGCCTGGCTATCCCTGGCCCCTCTACGGAAACGAAGGCT^ 

Gene : HepCla 

Segments : 7 
Offset : 91 
1st Codon : 1 

EGCGWAGWLLS PRGSR PSWG PTDPRRRS RN 
GAGGGATGCGGATGGGCTGKaCTGGCTGCTCAGCCCTAGGGGAAGCAGACCCTCCT 

Gene : HepCla 

Segments : 8 
Offset : 106 
1st Codon : 1 

RP SWGPTD PRRR SRNLGKV I DTLTCGFADL 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAAGCAGAAACCTCG 

Gene : HepCla 

Segments : 9 
Offset : 121 

1st Codon : 1 

LGKVIDTLTCGFADLMGYI PLVGAPLGGAA 
CTGGGAAAGGTCATCGATACCCTCACCTGOXMCTTCGCCGATCTG 

Gene : HepCla 

Segment^ : 10 
Offset : 136 

1st Codon : 1 

MGY1 PLVGAPLGGAARALAHGVRVLEDGVN 
ATGGGATACATTCCCCTCGTGGGAGCCCCTCTGGGAGGCGCTGCCAGAGCCCTC^ 

Gene : HepCla 

Segments : 11 
Offset : 151 
1st Codon : 1 

RALAHGVRVLEDGVNYATGNLPGCS FSI FL 
AGGGCTCTGG CTCACGG AGTG AGAGTGCTCG AGGATGGCGTCAACT ATGCCACAGG CAATCTGCCTGGCTGT AG CTTTAGCATTTT CCTC 

Gene : HepCla 

Segments : 12 
Offset : 166 
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1st Codon : 1 

YATGNLPGCS F S I F LLALLS C LTV P A SA Y Q 
TACGCTACCC^AAACCTCCCCGGATGCTCCTTCTCCATCTTC^ 



Gene : HepCla 

Segment U : 13 
Offset : 181 

1st Codon : 1 

LALLSCLTVPASAYQVRNSTGLYHVTNDCP 
CTGGCTCTGCTCAGCTCTCTCACAGTGCCTGCCTCrc 



Gene 
Segment S 
Offset 
1st Codon 
V R N 



HepCla 
14 
196 
1 

S T G L Y H V 



YEA 



I L H T 



GTGAGAAACTCCACCGGACTGTATCACGTCACCAATGACTGTCCCAATAGCTCCATCGTCTACGAAGCCGCTGACGCTATCCTCCACACA 



Gene 
Segment S 
Offset 
1st Codon 

N S S I 
AA 



HepCla 
IS 
211 
1 

V Y E 



SRC 



.CTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTC 



Gene : HepCla 

Segment # r 16 
Offset : 226 
1st Codon : 1 

PGCVPCVR EGNAS RCWVAM T PTVATR DGKL 
CCCGGATGCGTCCCCTGTGTGAGAGAGGGAAACGCTAGCAGATGCTGGGTC 



Gene : HepCla 

Segments : 17 
Offset : 241 
1st Codon : 1 

WVAMTPTVATRDGKLPATQLRRH, IDLLVGS 
TGGGTCGCC^TGACCCCTACCGTCGCCACAAGGGATGGCAAACTG^ 



Gene : HepCla 

Segments : 18 
Offset : 256 

1st Codon : 1 

PATQLRRH1DLLVGSATLCSALYVGDLCGS 
CCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTCCGCCCTCTACGTCGGCGATCTGTGTGGCTCC 



Gene : HepCla 

Segment^ : 19 
Offset : 271 
1st Codon : 1 

ATLCSALYVGDLCGSVFLVGOLFTFSPRRH 
GCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGC^ 



Gene : HepCla 

Segments : 20 
Offset : 286 
1st Codon : 1 

VFLV GQLFTFS PRRHWTTQGCNCS IYPGH I 
GTGTTTCTGGTCGGCCAACTGTTTACCTTTAGCCCTAGG 



Gene : HepCla 

Segments : 21 
Offset : 301 
1st Codon : 1 

WTTQGCNCSI YPGHITGHRMAW 
TGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAGGCC^ 



M M M N W S 



Gene 
Segments 
Offset 
1st Codon 

T G H 
AO 



HepCla 
22 
316 
1 

M A W D M 



M M N 



P T A A 



A Q 



CGGACACAGAATGGCTTGGGATATGATGATGAATTGGTCCCCCACAGCCGCTCTGGT 
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Gene 

Segment^ 

Offset 



HepCla 

23 

331 



1st Codon : 1 

TAALVMAQLLRI PQAI LDMIAGAHWGVLAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCAAGCCAT^ 



Segments : 24 
Offset : 346 
1st Codon : 1 

I LDMIAGAHWGVLAGIAYFSMVGNWAKVLV 
ATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCCTGGCTGG^ 

Gene : HepCla 

Segments . 25 
Offset : 361 
1st Codon : 1 

I AYFSMVGNWAKVLVVLLLFAGVDA ETHVT 
ATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAA^ 

Gene : HepCla 

Segment# : 26 
Offset : 376 
1st Codon : 1 

VLLLFAGVDAETHVTGGNAGRTTSGLVSLL 



Gene : HepCla 

Segments : 27 
Offset : 391 
1st Codon : 1 

GGNAGRTTSGLVSLLTPGAKQNI QLINTNG 
GGCGG AAACGCTGGCAGAACCACAAGCGGACTGGTCAGCCTTCCTGACACCCGGAGCCAAACAGAAT AT CCAA 

Gene : HepCla 

Segment* : 28 
Offset : 406 

1st Codon : 1 

TPGAKQNIQLINTNGSWHINSTALNCNESL 
ACCCCTGGCGCTAAGCAAAACATTCAGCTCATCAATACC 

Gene : HepCla 

Segments : 2 9 
Offset : 421 
1st Codon : 1 

SWHINSTALNCNESLNTGWLAGLFYQHKFN 
AGCTGGCACATTAACTCCACaSCTCTGAATTGCAATGAGTCCCTGAATACCGGATG^ 

Gene : HepCla 

Segments : 30 
Offset : 436 

1st Codon : 1 

NTGWLAGLFYQHKFNSSGCPERLASCRRLT 
AACACAGGCTGGCTGGCTGGCCTCTTCTATCAGCATAAGTTTA^ 

Gene : HepCla 

Segments : 31 
Offset : 451 

1st Codon : l 

SSGCPERLASCRRLTDFDQGWGP ISYANGS 
AGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCTCACCGATTTCGATCAGGGATGG 

Gene : HepCla 

Segments : 32 
Offset : 466 

1st Codon : 1 

DFDQGWGPI SYANGSGPDQRPYCWHYPPKP 
GACTTTGACCAAGGCTGGGGCCCTATCTCCTACGCTAACGGA^ 

Gene : HepCla 

Segments : 33 



Gene 



: HepCla 
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Offset : 481 

1st Codon : 1 

GPDQRPYC'WHYPPKPCGIVPAKSVCG PVYC 
GGCCCTGACCAAAGGCCTTACTGTTGGCATTACCCT^ 



Gene : HepCla 

Segment U : 34 
Offset : 496 

1st Codon : 1 
C G I V P A K 



TPS 



V G 



Gene 

Segment^ 
Offset 
1st Codon 
FTPS 



HepCl a 
35 
511 
1 

P V V V G 



P T Y 



TTCACACCCTCCCCCGTCGTGGTCGGCAO^CCGATAGGTCCGGCGCTCCCAC^ 



Gene 

Segment* 
Offset 
1st Codon 
A P T Y 



HepCla 
36 
526 
1 

S W G A 



GCCCCTACCTATAGCTGGGGCGCTAACGATACCGAT 



FVLNNTRPPLGNWFGCT 
GCTCAACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACA 



Gene : HepCla 

Segment U : 37 
Offset : 541 
1st Codon : 1 

LNNTRPPLGNW FGCTWMNS TG FTKVCGAPP 
CTGAATAACACAAGGCCTCCCCTCGCCAATTGCTTTGGCTGTACCTC^ATGAATAGCAC^ 



Gene 
Segment U 
Offset 
1st Codon 
W M N 



HepCla 
38 
556 
1 

S T G F T 



G A 



P C 



G G A G N N 



H C P 



TGGATGAACTCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTCTGTGATTG^ 



Gene : HepCla 

Segment^ : 39 
Offset ; 571 
1st Codon : 1 

CVIGGAGNNTLHCPTDCFRKH PEATYS RCG 
TX3CGTCATCGGAGGCGCTGGCAATAACACACTGCATTGCCCTACCGATTGCTTTAGGAA 



Gene 
Segment** 
Offset 
1st Codon 
D C F 



HepCla 
40 
586 
1 

R K H P 



E 



S 



C G 



W 



GACTGTTTCAGAAAGCATCCCGAAGCCACATACTCCTVC^TGTGGCTCCGGCCCTTC 



Gene : HepCla 

Segment U : 41 
Offset : 601 
1st Codon : l 

SGPWITPRCLVDYPYRLWHYPCTINYT IFK 
AGCGGACCCTGGATCACACCCAGATGCCTC^TGGATTACCCTTACAGACTGTGGCACTATCCCTGTACCATTAA 



Gene 
Segments 
Offset 
1st Codon 
R L W H 



HepCla 
42 
616 
1 

Y P C 



N Y T 



V G G 



AGGCTCTGGCATTACCCTTGCACAATCAATTACACAATCTTTAAGGTCAGGATGTACGTCGG^ 



Gene 
Segments 
Offset 
1st Codon 

V R M 



HepCla 

43 

631 

1 

V G G V 



E A A C N W 



R G E 
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GTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGAC^CTGCCTGTAACTGGACCAGAGGCGAAAGGTGTGACCTCGAGGATAGGGAT 



Gene : HepCla 

Segments : 4 4 
Offset : 646 

1st Codon : 1 

CNWTRGERCDLEDR DRSELSPLLLSTTQWQ 
TGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGAGACAGAAGCGAACTGTCCCCCCTC 



Gene 

Segment S 
Offset 
1st Codon 
R S E 



HepCla 
45 
661 
1 

S P L L 



L P C S F T T 



T G 



Gene : HepCla 

Segments : 4 6 
Offset : 676 

1st Codon : 1 

VLPCSFTTLPALSTGLIHLHQNIVDVQYLY 
GTGCTCCCCTGTAGCITCACCACACTGCCTGCCCTCAGCACAC^ 

Gene : HepCla 

Segments : 4 7 
Offset : 691 
1st Codon : 1 

LI HLHQNIVDVQYL.YGVGSS IASWA I KWEY 
CTGATTCACCTCCACCAAAACATTGTGGATGTGCAATACCT 

Gene : HepCla 

Segments : 4 8 
Offset : 706 
1st Codon : 1 

gvgss-i'aswaikweyvvllfllladarvcs 
ggcgtcc^ctccagcattgcctcctgggctatcaaatgggaatacgtcgtgct 

Gene : HepCla 

Segment* : 4 9 
Offset : 721 

1st Codon : 1 

VVLLFLL.LADARVCSCLWMMLLISQAEAAL 
GTGGTCCTGCTCrTCCTCCTGCTCGCOSATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCTG 

Gene : HepCla 

Segment** : 50 
Offset : 736 

1st Codon : 1 

CIjWMMIiLI SQAEAALENLVI JLNAAS IiAGTH 
TGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTC^ 



Gene : HepCla 

Segment* : 51 
Offset : 751 

l9t Codon : 1 

ENLVI LNAASLAGTHGLVSFLV F F C F A W Y L, 
G AG AATCTGGTCATCCT C AACGCTGCCTCCCTGGCTGGCACACACC^ ACTGGTCAGCTTTCTGGTCTTCTTTTG CTTTGCCTGGTACCTC 



Gene : HepCla 

Segments : 52 
Offset : 766 

1st Codon : 1 

GLVSFLVFFCFAWYLKGRWV PGAVYALYGM 
GGCCTCGTGTCCTTCCrCGTGTTTTTCTCTTTCGCTTGGTATCT 



Gene : HepCla 

Segments : 53 
Offset : 781 
1st Codon : 1 

KGRWVPGAVYALYGMWPLLLLLLAL PQRAY 
AAGGGAAGGTGGGTGCCTGGCGCTGrrGTATGCCCTCTACXGAATGTGGCCCCTCCTGCT 



Gene : HepCla 
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Segments : 54 
Offset : 796 

1st Codon : 1 

WPLLLLLLALPQRAYALDTEVAASCGGVVL 
TGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCT 

Gene : HepCla 

Segments : 55 
Offset : 811 
1st Codon : 1 

ALDTEVAASC6GVVLVGLMALTLS PYYKRY 
GCCCTCGACACAGAGGTCGCCGCTAGCTGTGGCGGAGtGGTCCTGGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTAT 

Gene : HepCla 

Segments : 56 
Offset : 826 
1st Codon • 1 

VGLMALTLSPVYKRYISWCLWWLQYFLTRV 
GTGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTC 

Gene : HepCla 

Segment^ : 57 
Offset : 841 
1st Codon : 1 

I SWCLWWLQY FLTRVEAQLHVWVP PLNVRG 
ATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTGCATGTGTGGGTGCCT 

Gene : HepCla 

Segments : 58 
Offset : 856 
1st Codon : 1 

EAQLHVWVPP LNVRGGRDAVI LLMCVVHPT 
GAGGCTCAGCTCCACGTCTGGGTCCCCCCTCTGAATGTGAGAGGCGGAA 

Gene : HepCla 

Segments : 59 
Offset : 871 

1st Codon : 1 

GRDAV 1LLMC VVH PTLVFDI TKLLLAVFGP 
C^CAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGTTTC 

Gene : HepCla 

Segments : 60 
Offset : 886 

1st Codon : 1 

LVFDI TKLLLAVFG PLWI L 0 A S LLKV PY FV 
CTGGTCTTCGATATCACAAAGCTCCTGCTCGCCGTCTTCGGACC^ 

Gene : HepCla 

Segments : 61 
Offset : 901 
1st Codon : 1 

LWI LQASLLKVPYFVRVQGLLRICALARKM 
CTGTGGATCCTCCAGGCTAGCCTCCTGAAAGTGCCTTACITTGTGAGAG 

Gene : HepCla 

Segments : 62 
Offset : 916 

1st Codon : 1 

RVQGLLRI CALAR KMIGGHYVQMAI IKLGA 
AGC^TCCAGGGACTGCTCAGGATTTGCGCTCTGGCTAC^AAAATGATTGGCGGACACT 

Gene : HepCla 

Segments : 63 
Offset : 931 
1st Codon : 1 

IGGHYVQMAI IKLGALTGTYVYNHLTPLRD 
ATCC^AGGCCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCA 

Gene : HepCla 

Segments : 64 

Offset : 946 

1st Codon : 1 
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LTGTYVYNHLTPLRDWAHNG LRDLAVAV E P 
CTGACAGGCACATACGTCTACAATCACCTCACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGACCTCGCCGTCGCCGTCGAGCCT 

Gene : HepCla 

Segment # : 65 
Offset : 961 
1st Codon : 1 

WAHNGLRDLAVAVEPVVFSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCrc 

Gene : HepCla 

Segments : 6 6 
Offset : 976 

1st Codon : 1 

VVFSQMETKLITWGADTAACGDI INGLPVS 
GTGGTCTTCTCCCAGATGGAGACAAAGCTCATCACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACC^ 

Gene : HepCla 

Segments : 67 
Offset : 991 
1st Codon : 1 

DTAACGDI INGLPVSARRGREILLG PADGM 
GACACAGCCGCTTGCGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAC^AGA 

Gene : HepCla 

Segment!* : 68 
Offset : 1006 

1st Codon : 1 

ARRGREI LLGPADGMVSKGWRLLAPITAYA 
GCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAA 

Gene : HepCla 

Segment^ : 69 
Offset : 1021 
1st Codon : 1 * • 

VSKGWRLLAPITAYAQQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACAGCCTATGCCCAACAGACAAGGGGACT 

Gene : HepCla 

Segments : 70 
Offset : 1036 

1st Codon : 1 

QQT RG LLGCI I TSLTGRDKNQVEGEVQIVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTGTGTCC 

Gene : HepCla 

Segments : 71 
Offset : 1051 
1st Codon : 1 

GRDKN QVEGEVQIVSTAAQTFLATCINGVC 
GGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAGCACAGCCGCT 

Gene : HepCla 

Segments : 72 
Offset : 1066 

1st Codon : 1 

TAAQT FLATCI NGVCWTVYHGAGTRTIASP 
ACCGCTGCCCAAACCTTTCTCGCTACCTGTATCAAT^ 

Gene : HepCla 

Segments : 73 
Offset : 1081 

1st Codon : 1 

WTVYH GAGTRTIASPKG PV I QMYTNVDQDL 
TGGACAGTGTATCACGGAGCCGGAACCAGAACCATTCCCTCCCCCAAAGGC^ 

Gene : HepCla 

Segments : 74 
Offset : 1096 
1st Codon : 1 

KGPVI QMYTNVDQDLVGWPAPQGSRSLTPC 
AAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAGGATCHX^ 
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Gene 



HepCla 



Segments : 75 
Offset : 1111 
1st Codon : 1 

VGWPAPQGSRSLTPCTCGSSDLYLVTRHAD 
GTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCT<^CCCCTTGCAGATGCGGAAGC^ 

Gene : HepCla 

Segments : 76 
Offset : 1126 

1st Codon : 1 

TCGSSDLYLVTRH ADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGOSATCTGTATCTGGTCACCAGACACGCTGACGTCATCCCT^ 

Gene : HepCla 

Segments : 77 

Offset : 1141 

1st Codon : 1 

VIPVRRRGDSRGSLLSPRPISYLKGSSGGP 
GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCCCCAGA 

Gene : HepCla 

Segments : 78 
Offset : 1156 

1st Codon : 1 

SPR PI SYLKGS SGG PLLCPAGHAVG I F R A A 
AGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCTCCGGCGGACCCCTC 

Gene : HepCla 

Segments : 79 
Offset : 1171 

1st Codon : 1 

LLC PAGHAVGI FRAAVCTRGVAKAVDF I P V 
CTGCTCTGCCCTGCCGGACACGCTGTGGGAATCTTTAGGGCTGCCGTCT 

Gene : HepCla 

Segments : 80 
Offset : 1186 

1st Codon : 1 

VCT RG VAK.AVD F I PVENLETTMRS PVFTDN 
GTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTTTATCCCTGTGGAAAACCTCGAGACAACCATGAGGTCCCCCGTCTTCACAGACAAT 

Gene : HepCla 

Segments : 81 
Offset : 1201 

1st Codon : 1 

ENLETTMRSPVFTDNSSPPAVPQS FQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCCCT6TGTTTACCGATAACTCCAGCCCT 

Gene : HepCla 

Segments : B2 
Offset : 1216 
1st Codon : 1 

SSPPAVPQSFQVAHLHAPTGSGKSTKVPAA 
AGCTCCCCCCCnXTCGTCCCCCAAAGCTTTCAG^ 

Gene : HepCla 

Segments : 83 
Offset : 1231 
1st Codon : 1 

HAPTGSGKSTKVPAAYAAQGYKVLVLNPSV 
CACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTGCCTATGCCGCTCAGGGATACAAAGTGCTC^ ' 

Gene : HepCla 

Segments : B4 
Offset : 1246 

1st Codon : 1 

YAA QGYKVLVLN PSVAATLG FGAYM S KAHG 
TACGCTGCCCAAGGCTATAAGGTCCTC^TCCTGAATCCCTCCGTGG 

Gene : HepCla 

Segments : 85 
Offset : 1261 
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1st Codon : 1 

AATLGFGAYMSKAHG IDPNIRTGVRTI TTG 
GCCGCTACCCTCGGCTTTGGCGCTTACATGAGCAAAGCCCATGGCATTGACCCTAACATTAGGACAGGCGT 

Gene : HepCla 

Segments : 86 
Offset : 1276 

1st Codon : 1 

I DPNIRTGVRTITTGSPI TYSTYGKFLADG 
ATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATAC 

Gene : HepCla 

Segment* : 87 
Offset : 1291 

1st Codon : 1 

S PI TYSTYGKFLADGGCS GGAYDI I ICDEC 
AGCCCTATCACATACTCCACCTATGGCAAATTCCTCGCCGATGGCGGATGCTCCGGCGG 

Gene : HepCla 

Segment ft : 88 
Offset : 1306 

1st Codon : 1 

GCSGGAYDI IICDECHSTDATSILGIGTVL 
GGCTGTAGCGGAGGCGCTTACGATATCIATTATCTGTGACX3AATGC 

Gene : HepCl a 

Segment # : 89 
Offset : 1321 
1st Codon : 1 

HSTDATS I LGI GTVLDQA ETAGARLVV LAT 
CACTC»CCGATGCCACAAGCATTCTGGGAATCGGAACCGTCCTGGATCAC^CTGAGACAGCCGGAGC 

Gene : HepCla 

Segments : 90 
Offset : 1336' ' 
1st Codon : 1 

DQA ETAGAR LVVLATATP PGSVTVPHPNIE 
GACCAAGCCGAAACCGCTGGCGCTAGGCTCGTG^STCCTTC^CTACCGCTACCCCT 

Gene : HepCla 

Segment!* : 91 
Offset : 1351 

1st Codon : 1 

ATPPGSVTVPHPNI EEVALSTTGEI PFYGK 
GCCACACCCCCTGGCTCCGTGACAGTGCCTCACCCTAACATTGAGGAAG7GGCTCT 

Gene : HepCla 

Segments : 92 
Offset : 1366 

1st Codon : 1 

EVALSTTGEI PFYGKAI P LEVI KGGRHL IF 
GAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTACGGAAAGGCTATCCCTCTTGGAAGTGATTAAGG 

Gene : HepCla 

Segment# : 93 
Offset : 1381 

1st Codon : 1 

A I PiiEVI KGGRHLI FCHS KKKCDELAAKLV 
GCCATTCCCCTCGAGGTCATCAAAGGCGGAAGGCATCTGATTTTCTGTCACT 

Gene : HepCla 

Segments : 94 
Offset : 1396 

1st Codon : 1 

CHS KKKCDELAAKLVALG I NAVAYYRGLDV 
TGCCATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGG^ 

Gene : HepCla 

Segments : 95 
Offset : 1411 
1st Codon : 1 

ALGINAVAYYRGLDVSVI PTSGDVVVVATD 
GCCCTCGGCATTAACGCTGTGG<riTACTATAGGGGACTGGATCTC 
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Gene : HepCla 

Segment** : 96 
Offset : 1426 
1st Codon : 1 

SVI PTSGDVVVVATDALMTGYTGDFDSVID 
AGCGTCATCCCTACCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATTTCGATAGCGTCAT.CGAT 

Gene : HepCla 

Segment** : 97 
Offset : 1441 

1st Codon : 1 

ALMTGYTGDFDSVI DCNTCVTQTVDFS LD P 
GCCCTCATGACAGGCTATACCGGAGACITTGACTCCGTGATTGACTGTAACACATGCGTCACCCAAACCGTCGACTTT 

Gene : HepCla 

Segment** : 98 
Offset : 1456 
1st Codon : 1 

CNTC VTQTVDFSLDPTFT I ETTTLPQDAVS 
TGCAATACCTGTGTGACACAGACAGTGGATTTCTCCCTGGATCCCACATTCACAATC^ 

Gene : HepCla 

Segment** : 99 
Offset : 1471 

1st Codon : 1 

TFT I ETTTLPQDAVSRTQRRGRTGRGK PG I 
ACCTTTACCATTGAGACAACCACACTGCCTCAGKSATCCCGTCAGCAGAACCCA^ 



Gene : HepCla 

Segment** : 100 
Offset : I486 
1st Codon : 1 

RTQRRGRTGRGKPGI YR FVAPGER PSG M FD 
AGGACACAGAGAAGGGGAAGGACAGGCAGAGGGAAACCCGGAATCTATAGGTTTGTGGCTCCCG^ 



Gene 

Segment** 
Offset 
1st Codon 
Y R F 



HepCla 
101 
1501 
l 

V A P G E R 



VLCECYDAGCAWY 



Gene : HepCla 

Segment** : 102 
Offset : 1516 
1st Codon : 1 

SSVLCEC YDAGCAWY E LT PAETTVRLRA YM 
AGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTGCCTGGTACGAACTGACACCCGCTGAGACAACrc 

Gene : HepCla 

Segment* : 103 
Offset : 1531 
1st Codon : 1 

ELTPAETTVRLRAYMNTPGLPVCQDHLE FW 
GAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCCT<^CCTCCCCGTCTGCCAAGACC^ 

Gene : HepCla 

Segment*! : 104 
Offset : 1546 

1st Codon : 1 

NTPGLPVCQDHLEFWEGVFTGLTHIDAH FL 
AACACACCCGGACTGCCTGTGTGTCAGGATCACCTCGAGTTTTGGGAAGGCGTCTTCACAGGCCTCACCCATATCGATGCCC^ 

Gene : HepCla 

Segment** : 105 
Offset : 1561 

1st Codon : 1 

EGVFTGLTHIDAHFLSQTKQSGENFPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGCTCACTITCrGTCCCAGACAAAGCAAAGCGGAGAGAATT^ 



Gene : HepCla 

Segment** : 106 
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Offset : 1576 

1st Codon : 1 

SQTKQSGENFPYLVAYQATVCARAQAPPPS 
AGCCAAACCAAACAGTCCGGCGAAAACTTTCCCTATCTGGT^ 



Gene : HepCla 

Segments : 107 
Offset : 1591 
1st Codon : 1 

YQATVCARAQAPPPSWDQMWKCLI RLKPTL 
TACGAAGCCACAGTGTGTGCCAGAGCCCAAGCCCCTCCCCCTAGCrC^^ 



Gene : HepCla 

Segment# : 108 
Offset : 1606 
1st Codon : 1 

WDQMW KCLIRLKPTLHG PTPLLYRLGAVQN 
TGGGATCAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTGCATGGCCCTACCCCT 



Gene : HepCla 

Segments : 109 
Offset : 1621 

1st Codon : 1 

HGPTPLLYRLGAVQNEVTLTHPVTKYIMTC 
CACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAAGTGACACTGACACACCCT 



Gene 
Segment S 
Offset 
1st Codon 



HepCla 
110 
1636 
1 



E 



T L T H 



V T K Y 



M S 



D L 



VVTSTWVLV 



Gene 

Segments 
Offset 
1st Codon 
MSA 



HepCla 
111 „, 
1651 
1 

D L E V V 



T S 



G G 



L A 



C L S 



ATCTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTC^ 



Gene 

Segments 
Offset 
1st Codon 
G G V 



HepCla 
112 
1666 
1 

L A A L A A Y 



GGCGGAGTGCTCGCCGCrrCTGGCTGCCTATTGCCTCAGC^ 



Gene : HepCla 

Segments : 113 
Offset : 1681 
1st Codon : 1 

CVVIVGRTVLSGKPAI I PDREVLYR EFDEM 
TGCGTCGTGATTGTGGGAAGGATTGTGCTCAGCX^AAAGCCTGCCATTATCCCTGACAGAG 



Gene : HepCla 

Segments : 114 
Offset : 1696 

1st Codon : 1 

II PDR EVLYREFDEMEECSQHLPY I EQGMM 
ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGTAGCCAACACCTCCCCT 



Gene : HepCla 

Segments : 115 
Offset : 1711 
1st Codon : 1 

EECSQHLPYI EQGMMLAEQFKQKALGLLQT 
GAGGAATGCTCCCAGCATCTGCCTTACATTGAGCAAGGCATGATGCT 



Gene 
Segments 
Offset 
1st Codon 
L A E Q 



HepCla 
116 
1726 
1 

F KQKALGLLQTAS 



V I 
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CTGGCTGAGCAATTCAAACAGAAAGCCCTCC^CCTCra 

-Gene : HepCla 

Segment^ : 117 
Offset : 1741 

1st Codon : 1 

ASRQAEVIAPAVQTNWQKLEVFWAKHMWNF 
GCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGA 

Gene : HepCla 

Segment^ : 118 
Offset : 1756 
1st Codon : 1 

WQKLEV FWAKHMWNF I S G I Q Y L A G L S T LP G 
TGGCAAAAGCTCGAGGTCTTCTGGGCCAAACACATGTGGAATTTCATTAGCGGAATCC^ 

Gene : HepCla 

Segments : 119 
Offset : 1771 

1st Codon : 1 

ISGIQY LAGLSTLPGNPAIAS LMAFTAAVT 
ATCTCCGGCATTCAGTATCTC^CTGGCCTCAGCACACTGCCTGG 

Gene : HepCla 

Segments : 120 
Offset : 1786 
1st Codon : 1 

N P A I A S LMA FTAAVTS P LTTS QTLL FN I LG 
AACCCTGCCATTGCCTCCCTGATGGCCTTTACCGCTGCCGTCACCT 

Gene : HepCla 

Segments : 121 
Offset : 1801 
1st Codon : 1 

SPLTTS QTLLFNILGGWVAAQLAAPGAATA 
AGCCCTCTGACAACCTCCCAGACACTGCTCTTTCAATATCCTCGG 

Gene : HepCla 

Segments : 122 
Offset : 1816 
1st Codon : 1 

GWVAAQLAAPGAATAFVGAGLAGAA IGSVG 
GGCItXX3TGGCTGCCCAACTGGCTGCCCCTGGC^ 

Gene : HepCla 

Segments : 123 
Offset : 1831 
1st Codon : 1 

FVGAG LAGAA IGSVGLGKVL VDI LAGYGAG 
TTCGTCGGCGCTGGC<rrCGCCGGAGCCGCTAT03GAAG<XTCGGCCTCGGCAAAGTGCrrCGT 

Gene : HepCla 

Segments : 124 
Offset : 1846 
1st Codon : 1 

LGKVLVDILA GYGAGVAGALVAFKIMSG EV 
CTGGGAAAGGTCCTGGTCGACATTCTGGCTGGCTAT^ 

Gene : HepCla 

Segments : 125 
Offset : 1861 

1st Codon : 1 

VAGALVAFK1MSGEVPSTEDLVNLUPAILS 
GTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCCGGCGAAGTGCCTAGCACAGAGGATCT 

Gene : HepCla 

Segments : 126 
Offset : 1876 
1st Codon : 1 

PSTEDLVNLLPAILS PGALVVGVVCAA ILR 
CCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTA^ 

Gene : HepCla 
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Segments : 127 
Offset : 1891 

1st Codon : 1 

PGALVVGVVCAAI LRRHVGPGEGAVQWMNR 
CCCGGAGCCCTCGTGGTCGGCGTCGTGTGTGCCGCTATCCTCAGGAGACACGTCGGCCC^ 

Gene : HepCla 

Segment # : 128 
Offset : 1906 
1st Codon : 1 

RHVGPGEGAVQWMNRLIAFASRGNHVSPTH 
AGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATGAATAGGCTCATCGCTTTCGCTAGCAGAGGCAATCACGTCA 

Gene : HepCla 

Segments : 12 9 
Offset : 1921 

1st Codon : 1 

LIAFASRGNHVSPTHYVPESDAAARVTAI L 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTATGTGCCTG 

Gene : HepCla 

Segment^ : 130 
Offset : 1936 

1st Codon : 1 

YV PESDAAARVTA I LSSLTVTQLLRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCC^ 

Gene : HepCla 

Segment* : 131 
Offset : 1951 

1st Codon : 1 

SSLTVTQL.LRRLHQWI SSEC'TTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGAGAAGGCTCCACCAATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGG 

Gene : HepCla 

Segments : 132 
Offset : 1966 

1st Codon : 1 

IS SECTTPCSGSWLRDIWDWICEVLSDFKT 
ATCTCCAGCGAATGCACAACCCCTTGCTCCGGCTCCTGGCTCAGGGATATCTG 

Gene : HepCla 

Segments : 133 
Offset : 1981 

1st Codon ; 1 

DI WDW1CEVLSDFKTWLKAKLMPQLPG I P F 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTGAAAA 

Gene s HepCla 

Segments : 134 
Offset : 1996 

1st Codon : 1 

WLKAKLMPQLPGI PFVSCQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTCATGCCTCAGCTCCCCGGAATCCCTTTC^ 

Gene : HepCla 

Segments : 135 
Offset : 2011 

1st Codon : 1 

VSCQRGYKGVWRGDG I MHTRCHCGAEI TGH 
GTGTCCTGCCAAAGGGGATACAAAGGCGTCTTGGAGAGGCGATGGCATTATGCAT 

Gene : HepCla 

Segments : 136 
Offset : 2026 

1st Codon : 1 

IMHTRCHCGAEITGHVKNGTMRIVGPRTCR 
ATCATGCACACAAGCTGTCACTGTGGCGCTGAGATTACCGGACA 

Gene - HepCla 

Segments : 137 

Offset : 2041 

1st Codon : 1 
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VKNGTMR I VGPRTCRNMWSGTFPINAYTTG 
GTGAAAAACGGAACCATGAGGATTGTGGGACCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCATTAACGCTT • 

Gene : HepCla 

Segments : 13 8 
Offset : 2056 
1st Codon : 1 

NMWSGTFPIN AYTTGPCTPLPAPNYTFALW 
AACATGTGGTCCGGCACATTCCCTATCAATGCCTATACCACAGGCCCTTGC^ 

Gene : HepCla 

Segments : 139 
Offset : 2071 
1st Codon : 1 

PCTPLPAPNYT FALWRVSAEEYVEI RRVGD 
CCCTGTACCCCTCTGCCTGCCCCTAACTATACCHTTGCCCTCT 

Gene : HepCla 

Segment^ : 14 0 
Offset : 2086 
1st Codon : 1 

RVSAEEYVEI RR VGDFHYVTGMTTDNLKCP 
AGGGTCAGCtjCTCAGGAATACGTCGAGATTAGGAGAGTGGGAGACTTTCACT 



Gene 
Segments 
Offset 
1st Codon 



HepCla 
141 
2101 
1 



N I* K C P C Q 



E F F T E 



D G 



Gene : HepCla 

Segments : 142 
Offset : 2116 
1st Codon : 1 

CQVPS PEFFTELDGVRLHRFA PPCK PLLRE 
TGCCAAGTGCCTAGCCCTGAGTTTTTCACAGAGCTCGACGGAGTGAGACTGCATAGGTTTGCCCCTCCCT 



Gene : HepCla 

Segments : 14 3 
Offset : 2131 
1st Codon : 1 

RLHRFAPPCKP LLREEVSFRVGLHEYPVGS 
AGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCTGAGAGAGGAAGTGTCCTTCAGAGTGGGACTX3CATGAGTATCC 

Gene : HepCla 

Segments : 14 4 
Offset : 2146 

1st Codon : 1 

E VSFRVGLHEY PVGSQLPCEPEPDVAVLTS 
GAGGTCAGCTTTAGGGTCGGCCTCCACGAATACCCTCTGGGAAGCCAACTGCCTT^ 

Gene : HepCla 

Segments : 14 5 
Offset : 2161 
1st Codon : 1 

QLPCE PEPDVAVLTSMLTDPSHITAEAAGR 
CAGCTCCCCTGTGAGCCTGAGCCTGACGTCGCCGTCCTCACAAGCATGCTG 

Gene : HepCla 

Segments : 14 6 
Offset : 2176 
1st Codon : 1 

MLTDPSH I TAEAAGRRLARG S PPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAA^ 

Gene : HepCla 

Segments : 147 
Offset : 2191 
1st Codon : 1 

RLARGSPPSMASS SASQLSAPSLKATCTAN 
AGGCTCGCCAGAGGCTCCCCCCCTAGCATC^CCTCCAGCTCCGCCTCCCAGCTC^^ 
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Gene 
Segment # 
Offset 



HepCla 
148 

2206 



125/216 



1st Codon : 1 

5QLSAPS LKATCTANHDS PDAEL1EAN LLW 
AGCC^CTGTCCGCCCCTAGCCTCAAGGCTACCTGTAeCGCTAACCATGACTCCCCCGATGCCGAACTGATTGAGGCTAACCTCCT 



Segments : 14 9 
Offset : 2221 
1st Codon : 1 

HDSPDAELI EANLI/WRQEMGGNI TRVE SEN 
CACGATAGCCCTGACGCTGAGCTCATCGAAGCCAATCTGCTCTGGAGACAGGAAATGGGAGGCAATATCA^ 

Gene : HepCla 

Segments : 150 
Offset : 2236 

1st Codon : 1 

RQEMGGNI TRVESENKVVILDSFDPLVAEE 
AGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAGTGGTCAT^ 

Gene : HepCla 

Segments : 151 

Offset : 2251 

1st Codon : 1 

KVVILDSFDPLVAEEDEREISVPAE ILRKS 
AAGGTCGTGATTCTC^ATAGCTTTGACCCTCTGGTCGCCGAAGAGGATGAGAGAG 

Gene : HepCla 

Segments : 152 
Offset : 2266 

1st Codon : 1 

DEREI SVPAEI LRKSRRFAQALPVWAR PDY 
G ACG AAAGGGAAATCTCCGTG CCTG CCGAAATCCTCAGG AAAAGC AG AAGGTTTGCCCAAGCCCTCCCCGTCTGGG CTAGG CCTG ACTAT 

Gene : HepCla 

Segments : 153 
Offset : 2281 

1st Codon r 1 

RRFAQALPVWARPDYNPPLVETWKKPDYEP 
AGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTGGTCGAGACATGGAAAAAGCCTC 

Gene : HepCla 

Segments : 154 
Offset : 2296 
1st Codon : 1 

NPPLVETWKKPDYEPPVVHGCPLPPPR SPP 
AACCCTCCCCTCGTGGAAACCTGGAAGAAACCCGATTACGAACCCCCTGT^ 

Gene : HepCla 

Segments : 155 
Offset : 2311 

1st Codon : l 

PVVHGCPLPPPRSPPVPPPRKKRTVVLTES 
CCOSTCC^CATC^CTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCC^ 

Gene : HepCla 

Segments : 156 
Offset : 2326 

1st Codon : 1 

V PP PR KKRTVVLTES TLSTALAELATK SFG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACCGAAAGCACACTGTCC^ 

Gene : HepCla 

Segments : 157 

Offset : 2341 

1st Codon : 1 

TLSTALAELATKSFGSSSTSGITGDNTTTS 
ACCCTGAGCACAGCCCTCGCCGAACTGGCTACCAAAAGCTTTGGCTCCAGCTCCACCTCCG 

Gene : HepCla 

Segments : 158 
Offset : 2356 



Gene 



HepCla 
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1st Codon : 1 

SSSTSGI TGDNT TTSSE PAPSGCPPDSDAE 
AGCTCCAGCACAAGCGGAATCACAGGCX5ATAACACAACCACAAGCTCCGAGCCTGCCCCTAGCGGA 



Gene 

Segments 
Offset 
1st Codon 
SEP 



HepCla 
159 
2371 
1 

APSGCPPDSDAES 



AGCGAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTG 



Gene : HepCla 

Segment** : 160 
Offset : 2386 
1st Codon : 1 

SYSSMPP LEGEPGDPDLSDGSWSTVSSEAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCTGGCGATCCCGATCTGTCCGACGGAAGCTGGAGCACAGTGTCCAGCGAAGCCGGA 



Gene 
Segment S 
Offset 
1st Codon 
D L S 



HepCla 
161 
2401 
1 

D G S W 



S T V S S E 



G T 



GACCTCAGCGATGGCTCCTGGTCCACCGTCAGCTCCGAGGCTGGCACAGW 



Gene : HepCla 

Segment* : 162 
Offset : 2416 

1st Codon : 1 

TEDVVCCSMSYSWTGALVT PCAAEEQ XLPI 
ACCGAAGACGTCGTGTGTTGCTCCATGTCCTACTCCTGGACAGGCGCTCTG^ 



Gene 

Segment# 
Offset 
1st Codon 
A L V T 



HepCla 
163 
2431 
1 

P C A A E 



E Q 



N A 



H H N 



GCCCTCGTGACACCCTGTGCCGCTGAGGAACAGAAACTGCCTATC^ 



Gene : HepCla 

Segments : 164 
Offset : 2446 

1st Codon : 1 

NALSNS LLRHHNLVYSTTSRSACQRQKKVT 
AACGCTCTCTCCAACTCCCrGCTCAGGCATCACAATCTGGT^ 



Gene 
Segments 
Offset 
1st Codon 
S T T S 



HepCla 
165 
2461 
1 

R S A C Q R Q 



S H Y Q 



agcacaacctccaggtccgcctgtcagagacagaaaaaggtcacctttgaca^ 



Gene : HepCla 

Segments : 166 
Offset : 2476 
1st Codon : 1 

FDRLQVLDSHYQ D V LKEVKAAASKVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCC^TTACCAAGACGTCCTGAAAGAGGTCAAGGCTGCCGCTAG^ 



Gene : HepCla 

Segments : 167 
Offset : 2491 
1st Codon : 1 

KEVKAAASKVKANLLSV EEACSLTPPHSAK 
AAGG AAGTG AAAGCCG CTGCCTCC AAGGTCAAGGCT AACCTCCTGTGCGTGG AAG AGGCTTGCTCCC^ 



Gene 
Segments 
Offset 
1st Codon 
S V E 



HepCla 
168 
2506 
1 

A C S L T P P 



H S A 



C H 



AGCGTCGAC^AAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTC^ 
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Gene : HepCla 

Segments : 169 
Offset : 2S21 
1st Codon : 1 

SKFGYGAKDV RCHARKAVAH INSVWKDLLE 
AGCAAATTCG^ATACGGAGCOVAAGACGTCAGGTGTC^CGCTAGGAAA^ 

Gene : HepCla 

Segments : 170 
Offset : 2536 

1st Codon : 1 

KAVAH INSVWKDLLEDS VTP IDTTIMAKNE 
AAGGCTGTGGCTCACATTAACTCCGTGTGGAAGGATCTGCTCGAGGAT^ 

Gene : HepCla 

Segment* : 171 
Offset : 2551 

1st Codon : 1 

DSVTPIDTTIMAKNEVFCVQPEKGGRKPAR 
GACTCCGTGACACCCATTGACACAACCATTATGGCTAAGAATGAGGTCTTCTGTGTGCAA 

Gene : HepCla 

Segment** : 172 
Offset : 2566 

1st Codon : 1 

VFCVQPEKGGRKPARLI VFPDLGVRVCEKM 
GTGTTTrGCCTCCAGCCTGAGAAAGGCGGAAGGAAACCCGCTAGGCT 

Gene : HepCla 

Segment^ : 173 
Offset : 25B1 

1st Codon : 1 

L I V F P DLGVRVCEKMALYDVVSKLPLAVMG 
CTGATTGTGTTTCCCGATCTC^GAGTGAGAGTGTGTGAGAAAATGGCTCTGTAT 

Gene : HepCla 

Segments : 174 
Offset : 2596 
1st Codon : 1 

ALYDVVS KL.PLAVMGS S YGFQYS PGQRVE F 
GCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGG 

Gene : HepCla 

Segments : 175 
Offset : 2611 
1st Codon : 1 

SSYGFQYSPGQRVEFLVQAWKSKKTPMGFS 
AGCTCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAATTCCTCCTGCAAGC 

Gene : HepCla 

Segments : 176 
Offset : 2626 

1st Codon : 1 

LVQAWKSKKTPMGFSYDTRCF DSTVTES DI 
CTC^TCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATC 

Gene : HepCla 

Segments : 177 
Offset : 2641 
1st Codon : 1 

YDTRCFDSTVTESDIRT EEA I YQCCDLD P Q 
TACGATACCAGATGCTTTGACTCCACCGTCACCGA^ 

Gene : HepCla 

Segments : 178 
Offset : 2656 

1st Codon : 1 

RTEEA IYQCCDLDPQARVAI KSLTERLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCTCAGGCTAGGGTCGCCATTAAGTCCCTGACAGAGAGAC^ 

Gene : HepCla 

Segments : 179 
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Offset : 2671 

1st Codon : 1 

A R V A I KSLTERLYVGGPLTNSRGENCGYRR 
GCCAGAGTGGCTATCAAAAGCCTCACCGAAAGGCTCTACGTCGGCGGACCCC 



Gene : HepCla 

Segments : 180 
Offset : 2686 

1st Codon : 1 

G PLTNSRG ENCGY.RRCRASGVLTTSCGNTL 
GGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGC^ 

Gene : HepCla 

Segment* : 181 
Offset : 2701 
1st Codon : 1 

CRASGVLTTSCGNTLTCYI KARAACRAA GL 
TGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCTATATCAAAGCCAGAGCCGCTTGC^ 



Gene : HepCla 

Segments : 1B2 
Offset : 2716 
1st Codon : 1 
T C Y I K A R 



D L V V I 



Gene : HepCla 

Segments : 183 
Offset : 2731 
1st Codon : 1 

QDCTMLVCGDDLVVI CESAGVQEDAAS L R A 
CAGGATTGCACAATGCTCGTGTGTGGCGATGACCTCGTGGTCATCTGTGAGTCCGCCGGA 



Gene : HepCla 

Segment^ : 184 
Offset : 2746 

1st Codon : 1 

CESAGVQEDAASLRAFTEAM TRYSAPPGDP 
TGCGAAAGCGCTGGCGTCC^GGAAGACGCTGCCTCCCTGAGAGCCTTTACCGAAGCCATGACCAGATACTCCGCCCCTCCCGGAGACCCT 



Gene : HepCla 

Segments : 185 
Offset : 2761 
1st Codon ; 1 

FTEAMTRYSAPPGDPPQPEYDLELITSCSS 
TTCACAGAGGCTATGACAAGGTATAGCGCTCCCCCTGGCGATCCCCCTCAGCCTGAGTATCACCTCGAGCTCATCACAAG 



Gene 
Segments 
Offset 
1st Codon 
POP 



HepCla 
186 
2776 
1 

YDLELITSCSS 



Gene 
Segments 
Offset 
1st Codon 
N V S 



HepCla 
187 
2791 
1 

VAHDGAGKR 



W E 



AACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAA 



Gene : HepCla 

Segments : 188 
Offset : 2806 

1st Codon : 1 

LTR DPTTPLARAAWETARHT PVNSWLGNI I 
CTGACAAGGGATCCCACAACCCCTCTGGCTAGGGCTGCCTGGGAGACAGCCAGAC^ 



Gene 

Segments 
Offset 
1st Codon 
T A R H 



HepCla 
189 
2821 
1 

T P V N 



S W h G N 



F A P T L 
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ACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACATTATCATGTTCGCTCCCACACTGTGGGC^ 

Gene : HepCla 

Segments : 190 
Offset : 2836 
1st Codon : 1 

MFAPTLWARMI LMTHFFSVLIARDQLEQAL 
ATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACT 

Gene : HepCla 

Segments : 191 
Offset : 2851 

1st Codon : 1 

FFSVL1ARDQLEQALDCEIYGACY SIEPLD 
TTCTTTAGCGTCCTGATTGCCAGAGACCAACTGGAACAGGCTCTGGATTC 

Gene : HepCla 

Segments : 192 
Offset : 2866 
1st Codon : 1 

DCEIYGACYS I E PLDLPPI I QRLHGLSAFS 
GACTGTCAGATTTACXX3AGCCTGTTACTCCATCGAACCCCTCGACCT 

Gene : HepCla 

Segments : 193 
Offset : 2881 

1st Codon : 1 

LPPIIQRLHGLSAFSLHSYS PGEI NRVAAC 
CTGCCTCCCATTATCCAAAGGCTCCACGGACTGTCCGCCTTTAGCCTCCACTCCTACrCCCCCG^ 

Gene : HepCla 

Segment^ : 194 
Offset : 2896 

1st Codon : 1 

LHSYS£GEINRVAACLRKLGVPPLRAWRHR 
CTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGG 

Gene : HepCla 

Segments : 19S 
Offset : 2911 
1st Codon : 1 

LRKLGVP PLRAW RHRARSVRARLLARGGRA 
CTGAG AAAGCT CGGCGTCCCCCCTCTG AG AG C CTGG AGGCATAGGG CTAGGTCCGTG AG AG CCAGACTGCTCG CC AG AGGCGGAAGGGCT 

Gene : HepCla 

Segments : 196 
Offset : 2926 

1st Codon : 1 

ARSVRARLLARGGRAAICGKYLFNWAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTTCCTGGCTAGGGGAGGCAGAGCCGCTATCTGTGGCAAA^ 

Gene : HepCla 

Segments : 197 
Offset : 2941 

1st Codon : 1 

AICGKYLFNWAVRTKLKLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTC 

Gene : HepCla 

Segments : 198 
Offset : 29S6 

1st Codon : 1 

LKLTPIAAAGRLDLSGWFTAGYSGGDIYHS 
CTGAAACTGACACCCATTGCCGCTGCCGGAAGX3CTCGACCTCAGCGGATGGTTTACCGCTGGCT 

Gene : HepCla 

Segments : 199 
Offset : 2971 

1st Codon : 1 

GWFTAGYSGGDI YHSVSHARPRWFW F C L L L 
GGCTGGTTCACAGCCGGATACTCCGGCGGAG ACATTTACGATAGCGTCAGCCATGCCAGACCCAGATGGTTTTGG l i l iGCCTCCTGCTC 

Gene : HepCla 
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Segment # 
Offset 



200 
2986 



1st Codon : 1 

VSHARPRWFWFCLLLLAAGVGI YLLPNRAA 
GTGTCCCACGCTAGGCCTAGGTGGTTCTGGTTCTGTCTGCTCCTGCrCGCC^ 



1st Codon : 1 

L A A G V G I Y 1* L P N R A A 
CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCC 

Segments in scrambled order: 



HepCla #77 

VI PVRRRGDSRGSLLSPR PI SYI*KGSSG.G P 
GTGATTCCCGTCAGGAGAAGCGGAGACTCCAGGGGAAGCCTCCTGTCCCCC^GACCCATTAGCTATCTGAAAGGCTCCAGCG 

HepCla #68 

ARRGREILLGPADGMVSKGWRLLAPITAYA 
GCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCA 

HepCla #143 

RLHRFAPPCKPLLREEVS F R V G LHEYPVGS 



VVFSQMETKLITWGADTAACGDI INGLPVS 
GTGGTCTTCTCCCAGATCGAGACAAAGCTCATCACATGGGGAGCCGATACCGCT 

HepCla #79 

LLCPAGHAVG I FRAAVCTRG VA KAV DF I PV 
CTGCTCTGeCCTGCCGGACACGCTGTGGGAATCTTTA 

HepCla #113 

CVVIVGRI VLSGKPAII PDR EVL.YR EFDEM 
TGCGTCGTGATTGTGGGAAGG ATTGTG CTCAGCGGAAAGCCTGCCATTATCC CTG AC AG AG AGGTCCTG TAT AGGG AATTCG ATGAG ATG 



PCTPL PAPNYTFALWRVSAE EYVEI RRVGD 
CCCTOTACCCCTCTGCCTGCCCXrrAACTATACCTTTGCCCrCTGGAG 

HepCla #174 

ALYDVVSKLPLAVMGSSYGFQYS PGQRVEF 
GCCCTCTACGATGTGGTCAGCAAACTGCCTCTGGCTGTGATGGGCTCCAGCTATGGCTTTC 

HepCla #57 

I SWCLWWLQYFLTRVEAQLHVWV PPLNVRG 
ATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAGCCCAACTC 

HepCla #51 

ENLVI LNAAS LAGTHGLVS F L V F F C FAWYL 
GAGAATCTGGTCATCCTCAACGCTGCCTCCCTGGCTCXSCACACACX^A 

HepCla #193 

LPPI I QRLHGLSAFSLHSYS PGE INRVAAC 
♦CTGCCTCCCAT/TATCCAAAGGCTCCACGGACTGTCCGCCTTTAGCCTCCACTCC^ 

HepCla #154 

NPPLVETWKKPDYEPPVVHGCPLPPPRSPP 
AACCCTCCCCTCGTGGAAACCTGGAAGAAACCC^ATTACGAACCCCCTGTGGTCCACGGATGCCCTCT^ 

HepCla #4 8 

GVGSS IASWAI KWEYVVLLFLLLADARVCS 
GGCGTCGGCTCCAGCATTGCCTCCTGGGCTATCAAATGGGAATACGTCGTGCTCC^ 

HepCla #37 

LNNTR P PLGNWFGCTWMNSTG FTKV CGAP P 
CTGAATAACACAAGGCCTCCCCTCGGCAATTGGTTTGGCTGTACCTGGATGAATAGCACAGGCTTTACCAAAGTGTGTGGCX^ 

HepCla #185 

FTEAMTRYSA PPGPPPQPEYDLELI TSCSS 



Gene 

Segment* 

Offset 



HepCla 

201 

3001 




HepCla #66 



HepCla #139 
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TTCACAGAGGCTATGACAAGGTATAGCGCTCCCCCTGGCGAT^ 



HepCla #54 

WPLLLLLLALPQRAYALDTEVAASCGGVVL 
TGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAG^ 

HepCla #70 

QQTRGLLGCI ITS LTGRDKNQVEGEVQIVS 
CAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTC 

HepCla #82 

SS P PAVPQS FQVAHLHAPTGSGKSTKV PAA 
AGCTCCCCCCCTGCCGTCCCCCAAAGCTTTCAGGTCGCCCATCT 



HepCla #104 
N T P G L 



C Q D H 



G V 



HepCla 
V L 



#26 
L L 



G G 



A G 



S G 



Li L. 



HepCla #110 

EVT LTH PVTKYI MTCMSADLEVVTSTWV L» V 
GAGGTCACCCTCACCCATCCCGTCACCAAATACATTATGACATGCATGAGCGC^ 

HepCla #56 

VGLMALTLS PYYKRY I SWCLWWLQYFLTRV 
GTG^XjACnXSATGGCCCTCACCCTCAGCCCTTACTATAAGAGATACATTAGCTGGTC 

HepCla #197 

AICGKYLFNWAVRTKLKLTPIAAAGRLDLS 
GCCATTTGCGGAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCT^ 

HepCla #25 s ' 

I AY FSMVGNWAKV LVVLL.LFAGVDAETHVT 
ATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGCTCCTGCTCCTGTTTGCCC^AG 

HepCla #147 

R1>A RGSPPSMASS SASQL.SAPSLKATCTAN 
AGGCTCGCCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCaSC^ 

HepCla #52 

GLVSFLVFFCFAWYLKGRWVPGAVYALYGM 
GGCCTCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCT^ 

HepCla #145 

QL.PCEPEPDVAVLTSML.TDPSHI TAEAAGR 
CAGCTCCCCTGTGAGCCTGAGCCTGAO;TCC<:CGTCCTGAC^GC^TGCrGACAGACCCTAGCCA 

HepCla #171 

DSVTPIDTTIMAKNEVFCVQPEKGGRKPAR 
C^CTCCGTGACACCCATTGACACAACCATTATGGCTAAGAATGA 

HepCla #84 

YAAQGYKVLVLNP SVAATLGFGAYMSKAHG 
TACGCTGCCCAAGGCTATAAGGTCCTGGTCCTGAATCCCTCCGTGGCTGCCACACTGGGATTCGGAGCCTAT^ 

HepCla #14 

VRN STGLYHVTNDCPNSS IVYEAADAI LHT 
GTGAGAAACTCCACCC^ACTGTATCACGTCACCAATGACTGTCCCAATAGCT 

HepCla #175 

S SYGFQYSPGQRVEFLVQAWKS KKT PMG FS 
AGCTCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAATTCCTCGTGCAAGCCTGGAAGTCCAAGAAAACCC 

HepCla #6 7 

DTAACGDI INGLPVSARRGREI L L G PADGM 
GACACAGCCGCTTGCGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAGGAGAGGCAGAGAGAT^ 

HepCla #148 

S QLSAPSLKATCTANHDS PDAELI EANLLW 
AGCCAACTGTCaSCCCCTAGCCTCAAGGCTACCTGTACCGCTAACCATGACrCCCCCGATGCCGAACTG 
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HepCla #120 

NPAIASLMAFTAAVTS PLTTSQTLLFNI LG- 
AACCCTGCCATTGCerCCCTGATGGCCTTTACCGCTGCCGTCACCTCCCCCCTCACCACAAGCCAAACCCT 

HepCla #176 

LVQAWKSKKTPMGFSYDTRCFDSTVTESDI 
CTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATGACACAAGGTGTTTCGATAGCACA 

HepCla #152 

DEREISVPAEI LRKSRRFAQALPVWARPDY 
GACGAAAGGGAAATCTCCGTGCCTGCCGAAATCCTCACkSAAAAGCAGAAGGTTTGCCCAAGCCCT 

HepCla #190 

MFA PTLWARMI LMTHFFSVLIARDQLEQAL 
ATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCAGCTCGA 



HepCla #96 
S V I P T 



V A T D 



TGYTGDFD 



HepCla #94 

CHSKKKCDELAAKLVAL GINAVAYYRGLDV 
TGC C AT AG C AAAAAG AAATGCG ATG AG CTCG C C G CT AAG CT CG TGGCTCTGGG AA TC AATGCCGTCGCCT ATT A C AGAGGC CTCGACG TC 

HepCla #46 

VLPCS FTTLPALS TG LI HLHQNI VDVQYLY 
GTCCTCCCCTGTAGCITTACCACACTCCCT^ 

HepCla #53 

KGRWVPGAVYALYGMWPLLLLLLALPQRAY 
AAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTACGGAATGTGGCCCCT 

HepCla #87 

S PI TYSTYGKFLA DGGCSGGAYD I I 1CDEC 
AG CCCTT AT C ACAT ACTCCACCTATGGCAAATTCCTCGCCG ATGGCGG ATG CT CCGGCGG AGCCTATG ACATTATCATTTGCGATGAGTGT 

HepCla #196 

ARSVRARLL.ARGG RAAICGKYLFNWAVRTK 
GCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGAGCCGCTATCTGTGGCAAATACCT 

HepCla #170 

KAVAH INSVWKDL LEDSVTP I DTTI MAKNE 
AAGGCTGTGGCTCACATTAACTCCGTGTGGAAGGATCTGCTCGAGGATAGCGTCACCCCTATCGATACCACAATCATGGCCAAAAACGAA 

HepCla #35 

FTPSPVVVGTTDRS GAPTYSWGANDTDVFV 



HepCla #16 

PGCVPCVREGNAS RCWVAMTPT VATRDGKL 
CCCGGATGCGTCCCCTGTGTGAGAGAGGGAAACGCTAGCAGATGCTGGGTG<5CTATGACA 

HepCla #183 

QDCTMLVCGDDLVVICESAGVQEDAASLRA 
CAGGATTGCACAATGCTGGTGTGTGGCGATGACXrrCGTGGTCATCTGTGAGTCCGCCGGAGTGCAAGAGGATC 

HepCla #125 

VAGALVAFKIMSG EVPSTEDLVN LLPAI LS 
GTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCCGGC^AAGTGC 

HepCla #177 

YDTRCFDSTVTESDI RTEEAI YQCCDLDPQ 
TACGATACCAGATGCTTTGACTCCACCGTC^CCGAAAGCGATATCAGAACCGAAGAGGCTATCTATCAGTGTTGCGATCTGG 

HepCla #103 

ELTPAETTVRLRAYMNTPGLPVCQDHLEFW 
GAGCrCACCCCrGCCGAAACC^CAGTGAGACTCAGAGCCTATATGAATACCCCTGGCCTCCCTO 



HepCla #186 

PQPEYDLELITSCSSNVSVAHD GAGKRVYY 
CCCCAACCCGAATACGATCTGGAACTGATTACCTCCTGCTCCAGCAATGTGTCCGTG 
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HepCla 89 

UGKVI DTLTCGFADLMGYI PLVGAPLGGAA 
CTGGG AAAGGTCATCG AT A CCCTCACCTGTGGCTTTGCCGATCTGATGGGCT AT ATCCCTCTGGTCGGC^ 

HepCla 893 

AI PLEVI KGGRHLI FCHSKKKCDELAAKLV 
GCCATTCCCCTCGAGGTC^TCAAAGGCGGAAGGCATCTCATTTTCTGTCACTCCAAGAAAAAG 

HepCla #112 

GGVLAALAAYCLSTGCVVI VGRI VLSGK PA 
GGCGGAGTGCTCGCCGCTCTGGCTGCCTATTGCCTCAGCACAGGCTGTGTGGTCATCCT 

HepCla #184 

CESAGVQEDAASLRAFTEAMTRYSA PPGDP 
TGCCJ^GCGCTC^CGTCCAGGAAGACGCTGCCTCCCTGAGAGCCTTTACCGAAGCCATGA 

HepCla #199 

GWFTAGYSGGDIYHSVSHARPRWFWFCLLL 
GGCTCGTTCACAGCCGGATACrTCCGGCGGAGACATTTACCATAGCGTCAGCCATGCCAGACCCAGATG 

HepCla #158 

S S S T S G I TGDNTTTSSEPAPSGCPPDSDAE 
AGCTCCAGCACAAGCOTAATCACAGGCGATAACACAACCACAAGCTCCGAGCCTCCrc 

HepCla #100 

RTQRRGRTGRGKPG I YR FVAPGERPSGM FD 
AGG ACACAG AG AAGGGGAAGG ACAGG CAG AGG CAAACCCGGAATCTATAGGTTTGTGGCTCCCGGAG AGAGACCCTCCGG CATGTTCG AT 

HepCla #43 

VRMYVGGVEHRLEAACNWTRGERCDLEDRD 
GTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCrrCGAGGCTGCCTGTAACTGGACCAGAGGCGAAAGGTC 

HepCla #58 

EAQLHVWVPPLNVRGGRDAVI LLMCVVH PT 



HepCla #4 

LGVRATRKTSERSQPRGRRQPI PKARRPEG 
CTGGCAGTGAGAGCCACAAGCAAAACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGCAACCC^ 

HepCla #187 

NVS VAHDGAGKRVYYLTRD PTTPLA RAAWE 
AACGTCAGCGTCGCCCATGACC^AGCCG^AAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAA 

HepCla #159 

SEPAPSGCPPDSDAESYSSMPPLEGEPGDP 
AGCGAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCC^ 

HepCla 8 S3 

I GGHYVQMAI IKLGALTGTYVYNH LTPLRD 
ATCGGAGGCCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATCTGACACCCCTCAGGGAT 

HepCla 8126 

PSTEDLVNLLPAI LSPGALVVGVVCAAI LR 
CCCTCCACCGAAGACCTCGTGAATCTGCTCCCCGCTATCCTCAGCCCTK^CGCTCT 

HepCla #24 

I LDMI AGAHWGVLAG IAYFSMVGNWAKVLV 
ATCCTCGACATGATCGCTTGGCGCTCACTGGGGCGTCCTGGCTGGCATTGCCTATTTCT 

HepCla #7 

EGCGWAGWLLSPRGSRPSWGPTDPRRRSRN 
GAGGGATGCGGATGGGCTGGCTC^CTGCTCAGCCCTAGGGGAAGCAGACCCT^ 

HepCla 821 

WTTQGCNCSIYPGHITGHRMAWDMMMNWSP 
TGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAGGCCATAGGATGGCCTC 

HepCla 817 

WVAMT PTVAT RDGKLPATQLRRHI DLLVGS 
TGGGTCGCCATGACCCCTACCGTCGCCACAAGGGATGGCAAACIXXTCTGCCACACAGCTCftGGAGACACATTGA 



HepCla 842 
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RLWHYPCTINYTI FKVRMYVGGVEHRLEAA 
AGGCTCTGGCATTACCCTTGCACAATCAATC 

HepCla #172 

VFCVQPEKGGRKPARLIVFPDLGVRVCEKM 
GTGTTTTGCGTCCAGCCTGAGAAAGGCGGAAGGAAACCCGCTAGGCTCATCGTCTTCCCT 

HepCla 810 

MGYIPLVGAPLGGAARALAHGVRVLEDGVN 
ATGGGATACATTCCCCTCGTCGGAGCCCCrCTGGGAGGTC 

HepCla #27 

GGNAGRTTSGLVS LLTPGAKQN I QLINTN G 
GGCGGAAACGCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCCA 

HepCla #13 

LALLSCLTVPASAYQVRNSTGLYHVTND CP 
CTGGCTCTGCrCAGCTGTCTGACAGTGCCTGCCTCCGCCTATCAGGTCAGGAATAGCACAGGCCTCTACC^ 

HepCla #71 

GRDKNQVEGEVQI VSTAAQTFLATC INGVC 
GGCAG AG ACAAAAACC AAGTGG AAGGCG AAGTG C AAATCGTCAG CACAGCCG CTCAG ACATTCCTCGC CACATGCATT AACGG AGTGTGT 

HepCla #18 

PATQLRRHIDLLVGSATLCSALYVG. DLCCS 
CCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCC^ 

HepCla #83 

HAPTGSG KSTKVPAAYAAQGYKVLVLN P S V 
CACGCTCCXTACAGGCTCCGGCaAAAGCACAAAGGTCCCCGCTGCCTATG 

HepCla #6 

RTWAQPGYPW P LYGNEGCGWAGWLLSPRGS 
AGGACA.TGGGCTCAGCCTGGCTATCCCTGGCCCCrrCTACGGAAACGAAGGCTGTGGCTGGGCCGGATGGCT 

HepCla #162 

TEDVVCCSMSYSWTGA L'V TPCAA.EEQKLPI 
ACCGAAGACGTCGTGTGTTGCrCCATGTCCTACTCCTGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCT 

HepCla #55 

ALDTEVAASCGGVVLVG LMALTLSPYYKRY 
GCCCTCGACACAGAGGTCGCCGCTAGCTGTGGCGGAGTGGTCCTGGTCXK5CCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTAT 

HepCla #38 

WMNSTGFTKVCGAPPCVIGGAGNNTLHCPT 
TGGATGAACTCCACCGGATTCACAAAGGTCTGCGGAGCCCerCCCTGTGTGATTGGCGGAGCCGGAAACAATACCCTCCACTC 

HepCla #168 

SVE EACS L.TP P HSAKSKFGYGAKDVRCHAR 
AGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTGGCTATGGCGCTAAGGATGTGAGATGCCATGCC^ 

HepCla #119 

I SG I QYLAGLS TL PGNPAI ASLMA FTAAVT 
ATCTCCGGCATTCAGTATCTGGCTGGCCTCAGCACACTGCCTGGCAATCCCG 

HepCla #3 

QIVGGVYLLPRRG PRLGVRATRKTSERSQP 
<2AG ATTGTGGGAGGCGTCT ACCTCCTGCCT AGG AG AGG CCCT AGGCTCGGCG TCAGGGCTACC AGAAAGACAAGCG AAAGGTCCCAGCCT 

HepCla #194 

LHSYSPGEINRVAACLRKLGVPPliRAWRHR 
CTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGGAAACTGGGAGTGCCTCCCCTCAGG 

. HepCla #189 

TARHTPVNSWLGNI I MFAPTLWARMI LMTH 
ACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACATTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTCTGATGACCCAT 

HepCla #81 

ENLETTMRSPVFTDNSSPPAVPQSFQVAHL 
GAGAATCTGGAAACCACAATGAGAAGCCCTGTGTTTACCGAT^ 

HepCla #91 

ATPPGSVTVPH PNI EEVALSTTGE I PFY'GK 
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GCCACACCCCCTGGCTCCGTGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCT 



HepCla #60 
L V F D 



F G 



L 0 



HepCla #23 

TAALVMAQLLRI PQAI LDMIAGAHWGVLAG 
ACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCAAGCCATTCTC 

HepCla #98 

CNTCVTQTVDFSLDPTFTIETTTLPQDAVS 
TGCAATACCTGTGTGACACAGAGAGTGGATTTCTCCCTGGA 

HepCla #109 

HGPTPLLYRLGAVQNEVTLTHPVTKYI MTC 
CACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAA^ 

HepCla #179 

arva ikslterl.yvgg pltnsrgencgyrr 
gccagagtggctatcaaaagcctcaccgaaaggctctao;tcggcg^acccctcaccaatagcaga 

HepCla #39 

CVIGGAGNNTLHCPTDCFRKHPEATYSRCG 

tgcgtcatcggaggcgctggcaataacacactgcattgccctaccgattgc^^ 

HepCla #76 

TCGSSDLYLVTRHADVIPVRRRGDSRGSLL 
ACCTGTGGCTCCAGCGATCrGTATCTGGTCACC^GACACGCTGACGTCATCCCTGTGAGAAGGAGAGGC 

HepCla #138 

NMWSGTFPINAYTTGPCTPL PAPNYTFALW 
AACATGTGGTCCGGCACATTCCCTATCAATGCCTATACCACAGGCCCTTGCAC^ 

HepCla #89 *- ' 

HSTDAT S I LG IGTVLDQAETAGARLVV LAT 
CACTCCACCGATGCCACAAGCATTCTGGGAATCGGAACCGTCCT 



HepCla #130 

YVPESDAAARVTAILSSLTVTQLLRRLHQW 
TACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCrTCACCGTCACCCAACTG 



HepCla #8 

RPSWGPTDPRRRSRNLGKVIDTLTCGFADL 
AGGCCTAGCrGGGGCCCrrACCGATCCCAGAAGGAGAAGCAGAAACCTCGGCAAAGTGATTGACACACTGACATGCGGATTCGCT 



HepCla #33 

GP.DQRPYCWHYPPKPCGI VPAKSVCGPVYC 
GGCCCTGACCAAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGGCATTGTGC 

HepCla #115 

EECSQHLPYIEQGMMLAEQFKQKALGLLQT 
GAGGAATGCTCCC^GCATCTGCCTTACATTGAGCAAGGCATGATGCTCGCCGAAC^ 

HepCla #107 

YQATVCARAQAPPPSWDQMWKCLIRLK PTL 
TACCAAGCCACAGTGTGTGCCAGAGCCCAAGCCCCTCCCCCTAGCTGGGACCAAATGTGGAAGTGT^ 

HepCla 8 34 

CGI VPAKSVCGPVYCFTPSPVVVGTTDRSG 
TGCGGAATCGTCCCCGCTAAGTCCGTGTGTGGCCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACC 

HepCla #131 

SSLTVTQLLRRLHQWI SSECTTPCSGSWLR 
AGCTCCCTGACAGTGACACAGCTCCTGAGAAGGCTCCACCAATGGATTAGCTCCGAGTGTACCACACCCTGTAG 

HepCla #161 

DLSDGSWSTVSSEAGTEDVVCCSMSYSWTG 
GACCTCAGCGATGGCTCCTGGTCCACCGTCAGCTCCGAGGCTGGCACAGAGGATGTGGTCTC 

HepCla #108 

WDQMWKCLIRLKPTLHGPTPLLYRLGAVQN 
TGGGATGAGATGTGGAAATGCCTCATCAGACTGAAACCCACACTG 
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HepCla 8116 

LAEQFKQKALGLLQTASRQAEVIAPAVQTN 
CTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCA^ 

HepCla #118 

WQKLEVFWAKHMWNFISGIQYLAGLSTLPG 
TGGCAAAAGCTCGAG<n , CTTCTGGGCCAAACACATGTGGAATTTCATTAGCGGAA 

HepCla #129 

LIAFASRGNHVSPTHYVPESDAAARVTAIL 
CTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTC 

HepCla #19 

ATLCSALYVGDLCGSVFLVGQbFTFS PRRH 
GCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGCGGAAGCGTCTTCCTCGTGGGACAGCTC^ 

HepCla #102 

SSVLCECYDAGCAWYELTPAETTVRLRAYM 
AGCTCCGTGCTCTGCGAATGCTATGACGCTCKaCTGTGCCTGGTACGA^ 

HepCla #12 2 

GWVAAQLAAPGAATAFVGAGLAGAAIGSVG 
GGCTGGGTGGCTGCCCAACTGGCTGCCCCTGGCGCTGCCACA^ 

HepCla #2 9 

SWH INSTALNCNES LNTGWLAGLFYQHKFN 
AGCTGGCACATTAACTCCJ^CCGCTCTGAATTGCAATGAGTCCCTGAATACC^ 

HepCla #164 

NALSNSLLRHHNLVYSTTSRSACQRQKKVT 
AACGCTCTGTCCAACTCCCTGCTCAGGCATCACAATCTGGTCTACTCCAC^ 

HepCla #1 

AAMSTNPKPQRKTKRNTNRRPQDVKFPGGG 
GCCGCTATtSTCCACCAATCCCAAACCCCAAAGGAAAACCAAAAGGAATACCAATAGGAGACCCCA^ 

HepCla #106 

SQTKQSGENFP YLVAYQATVCARAQAPPPS 
AGCCAAACCAAACAGTCCGGCGAAAACTTTCCCTATCTGGTCGCCTATCAGGCTAC 

HepCla #36 

APTYSWGANDTDVFVLNNTRPPLGNWFGCT 
GCCCCTACCTATAGCTGGGGCGCTAACGATACCGATGTGTTTGTGCTCAACAAT^ 

HepCla #156 

VPPPRKKRTVVLTESTLSTALAELATKS FG 
GTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACCGAAAGCACACTGTCCACCGCTCTGGCTGAGCT 

HepCla #165 

STTSRSACQRQKKVTFDRLQVLDSHYQDVL 
AGCACAACCTCCAGGTCCGCCTGTCAGAGACAGAAAAAGGTCACCTTTGACAGACTGCAAGTGCT 

HepCla #90 

DQAETAGARLVVUATATPPGSVTVPHPNIE 
GACCAAGCCGAAACCGCIXKSCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACra 

HepCla #141 

FHY VTGMTTDNLKCPCQVPS PEFFTELDGV 
TTCCATTACGTCACCGGAATGACAACCGATAACCTCAAGTGTCCCTGTCAGGTCCCCTCCCCCGAATTCTTTACCGAACT<K3ATGGCGTC 

HepCla #198 

LKLTP IAAAGRLDLSGWFTAGYSGGD I YHS 
CTGAAACTGACACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGGTTTACCGCTG 

HepCla #117 

ASRQAEVIAPAVQTNWQKLEVFWAKHMWNF 
GCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGAA 

HepCla #181 

CRA SGVLTTSC GNTLT'CY I KARAACRAA GL 
TGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCGGAAACACACTGACATGCra 
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HepCla #166 

FDRLQVLDSHYQDVLKEVKAAASKVKANLL 
TTCGATAGGCTCCAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAGAGGTCAAGGCTGCCGCTAGCAAAGTGAAAGCCAATCTGCTC 

HepCla #180 

GPLTNSRGENCGYRRCRASGVLTTSCGNTL, 
GGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAGCTGTGGCAATACCCTC 

HepCla #136 

IMHTRCHCGAEITGHVKNGTMRI VGPRTCR 
ATCATGCACACAAGGTGTCACTGTGGCGCTGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGAC 

HepCla #14 4 

EVS FRVGLHEYPVGSQL.PCEPEPDVAVLTS 
GAGGTCAGCTTTAGGGTCGGCCTCCACGAATACCCTGTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGA 

HepCla #167 

KEVKAAASKVKANLLSVEEACSLTPPHSAK 
AAC^AAGTGAAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCTGTCCGTG 

HepCla #59 

GRDAVI LLMCVVH PTLVFD ITKLLLAVFGP 
GGCAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCT^ 

HepCla #14 6 

MLTDPSHI TAEAAGRRLARGS PPSMASSSA 
ATGCTCACCGATCCCTCCCACATTACCGCTGAGGCTGCCGGAAGGAGACTGGCTAG 

HepCla #78 

SPRPI S YLKGSSGGPLLCPAGHAVGI FRAA 
AGCCCTAGGCCTT^TCTCCTACCTCAAGGGAAGCTCCGGCGGACCCCTCCTGTGTCC 

HepCla #32 

DFDQGWGPI SYANGSGPDQRPYCWHYPPKP 
GACTTTGACCAAGGCTGGGGCCCTATCTCCTACGCTAACGGAAGCGGACCCGATCAGAGACCCTATTC 

HepCla #128 

RHVGPGEGAVQWMNRLIAFASRGNHVS PTH 
AGGCATGTGGGACCCGCAGAGGGAGCCGTCCAGTGGATGAATAGGCTCATCGCTTT^ 

HepCla #50 

CLWMMLLI SQAEAALENLV I LNAASLAGTH 
TGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTGGAAAACCTCGTGATTCTGAATGCCGCTAGCCTCGCCGGAA 

HepCla #114 

I I PDREVLYREFDEMEECSQHLPYI EQGMM 
ATCATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGTAGCCAACACCTCCCCTATA 

HepCla #4 7 

LIHLHQNIVDVQYLYGVGSS IASWAI KWEY 
CTGATTCACCTCCACCAAAACATTGTC^ATGTGCAATACCTCTACGGAGTGGGAAGCrCCATCG 

HepCla #200 

VSHAR PRWFWFCLLIjLAAGVGI YLLPNRAA 
GTGTCCCACGCTAGGCCTAGGTGGTTCTGGTTCTGTCTGCTCCTGCTCGCCGC 

HepCla #8 5 

AATLG FGAYMSKAHG I DPN I RTGVRTI TTG 
GCCG CTACCCTTCGGCTTTGGCGCTT AC ATGAGC AAAGCC CATGGCATTG ACCCTAACATTAGG AC AGG CGTC AGG AC AATC ACAACCGG A 

HepCla #62 

RVQGLLRI CALARKMIGGHYVQMAI IKLGA 
AGGGTCCAGGGACTGCTCAGGATTTGCGCrCTGGCTAGGAAAATGATTGGCGGACACTATG 

HepCla #153 

RR FAQA LPVWARPDYNPPLVETWK KPDYEP 
AGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTC 

HepCla #72 

TAAQT FLATCINGVCWTVYHGAGTRTIASP 
ACCGCTGCCCAAACCTTTCTGGCTACCTGTATCAATGGCGTCTGCTGGACCGTCTACCATG 



HepCla #65 
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WAHN GL RDLAVAVEPVVFSQMETKLITWGA 
TGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTGTGGAACCCGT^ 

HepCla #74 

KGPVIQMYTNVDQDLVGWPAPQGSRSLT PC 
AAGGGACCCX3TCATCCAAATGTATACCAATGTGGATCAGGATCTGGTCCGCIX5GCCC<5CTCCCCAAGGCT 

HepCla #151 

KVVILDSFDPLVAEEDEREISVPAEILRKS 
AAGGTCGTGATTCTGGATAGCTTTGACCCTCTGGTCGCCGAAGAGGATGAGAGAGA 

HepCla #64 

LT GTYVYNHLTPLRDWAHNGLRD LAVAVE .P 
CTGACAGGCACATACGTCTACAATCACCTCACCCCTCTGAGAGACTGGGCCCAT 

HepCla #8 0 

VCTRGVAKAVDFI PVENLETTMRS PVFT DN 
GTGTGTACCAGAGGCGTCGCCAAAGCCG^CGACTTTATCCCTGTGGAAAACCTCGAGAC^^ 

HepCla #95 

ALGINAVAYYRGLDVSVIPTSGDVVVVATD 
GCCCTCGGCATT AACG CTGTGG CTTACT ATAGGGG ACTGGATGTGTCCGTG ATTCCCACAAG CGG AGACGTCGTGGTCGTGGCTACCGAT 

HepCla #111 

MSADLEVVTSTWVLVGGVLAALAAYCLSTG 
ATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGG 

HepCla #97 

ALMTGYTGDFDSVI DCNTCVTQTVDFS LDP 
GCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGATTGACTGTAA 

HepCla #2 

NTNRRPQDVK FPGGGQI VGGVYLLP RRG PR 
AACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGCCAAATCGTCGGC 

HepCla #11 

RA LA HGVR V L E DG VNYATGNL P G • C S F S I F L 
AGGGCTCTGGCTCACGGAGTGAGAGTGCTCGAGGATGGCGTCAACTATGCCACAGGC^^ 

HepCla #169 

SKFGYGAKDVRCHARKAVAHINSVWKDLLE 
AGCAAATTCGGATACGGAGCCAAAGACGTCAGGTGTCACGCTAGGAAAGCCG 

HepCla #28 

TPGAKQNIQLINTNGSWHINSTALNCNESL 
ACCCCTGG CGCTAAGC AAAACATTC AGCTCATCAATACCAATGG CTCCTGGCAT ATCAATAG C AC AGCCCTCAACTGT AA CG AAAGCCTC 

HepCla #30 

NTGWLAGLFYQHKFNSSGCPE RLASCR RLT 
AACACAGGCTGGCTC^CTGGCCTCTTCTATCAGCATAAGTTTAACrCCAGCGGATGCCCTGAGAG 

HepCla #4 9 

VVLLFLLLADARVCSCLWMMLLI S QAEAAL 
GTGGTCCTGCTCTTCCTCCTGCTCGCCGATGCCAGAGTGTGTAGCTGTCTGTGGATGATGCT 

HepCla #192 

DCEI YGACYSI EPLDL 



HepCla #73 

WTVYHGAGTRTIASPKGPVIQMYTNVDQDL 
TGGACAGTGTATCACC^AGCCGGAACCAGAACCATTGCCTCCCCCAAAGGCCC^^ 

HepCla #101 

YRFVAPGERPSGMFDSS. VLCECYDAGCAWY 
TACAGATTCX5TCGCCCCIX^CGAAAGGCCTAGCGGAATGTTTGACTCCAGCGTCCTGTGTGAGTGTTACG 

HepCla #4 5 

RSELS PLLLSTTQWQVLPCSFTTL PALSTG 
AC^TCCGAGCTCAC^CCTCTGCTCCTGTCCACCACACAGTGGCAGGTCCTGCCTTGCTCCTTCACAACCCT 

HepCla #195 

LRKLGVPP LRAWRHRARSVRARLLA RGGRA 
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CTGAGAAAGCTCGGCGTCCCCCCTCTGAGAGCCTGGAGGCATAGGGCTAGGTCCGTGAGAGCCAGACTGCTCGCCAGAGGCGGAAGGGCT 
HepCla #121 

SPLTTSQTLLFNI LGGWVAAQLAA PGAATA 
AGCCCTCTGACAACCTCCCAGACACTGCTCTTCAATATCCTCGGCGGATGGGTCGCCGCTCAGCTC 

HepCla #61 

LWILQASLLKVPY FVRVQGLLR I CALARKM 
CTGTGGATCCTCCAGGCTTlGCCTCCrGAAAGTGCCTTA 

HepCla #137 

VKNGTMRIVGPRTCRNMWSGTFPI NAYTTG 
GTGAAAAACGGAACCATGAGGATTGTGGGACCCAGAACCTGTAGGAATATGTGGAGCGGAACCTTTCCCA^ 

HepCla #92 

EVALSTTGEI PFYGKAI PLEVI KGGRHLI F 
GAGGTCGCCCTCAG CACAACCGGAG AG ATTCCCTTTTACGGAAAGGCTATCCC^CTGG AAGTG ATTAAGGGAGG CAGA C ACCTCATCTTT 

HepCla #188 

LTRDPTTPLARAAWETARHTPVNSWLGNI I 
CTGACAAGGGATCCCACAACCCCTCTGGCTAGGGCTGCCTGGGAGACAGCCAGACA 

HepCla #140 

RVSAEEYVEIRRVGDFHYVTGMTTDNLKC P 
AGGGTCAGCGCTGAGGAATACGTCGAGATTAGGAGAGTGGGAGACTTTCACTATGTGACAGGCATGACCACAGACAATCTG 

HepCla #155 

PVVHGCPLPPPRSPPVPPPRKKRTVVLTES 
CCCGTTOTGCATGGCTGTCCCCTCCCCCCTCCCAGAAGCCCTCCCGTCCCCCCTCCCAGAAAGAAAAG^ 

HepCla #157 

TLSTALAELATKS FGSSSTSGI TGDNTTTS 
ACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAAAAGCTTTGGCTCCAGCTCCACCTCCGG 

HepCla #135 ^ ■ 

VSCQRGYKGVWRGDGIMHTRCHCGAEITGH 
GTGTCCTGCCAAAGGGGATACAAAGGCGTCTGGAGAGGCGATGGCATTATGC^ 

HepCla #20 

VFLVGQLFTFS PRRHWTTQG CNCS I YPGHI 
GTGTTTCTGGTCGGCCAACTGTTTACCTTTAGCCCTAGGAGACACTGGACCAC^ 

HepCla #123 

FVGAGLAGAAI GSVGLGKVLVD I LAGYGAG 
TTCOTCGGCGCTGGCCTCGCCGGAGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCT 

HepCla #133 

DIWDWICEVLSDFKTWLKAKLMPQLPGI PF 
GACATTTGGGATTGGATTTGCGAAGTGCTCAGCGATTTCAAAACCTGGCTGAAAG 

HepCla #15 

NSSIVYEAADAILHTPGCVPCVREGNASRC 
AACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTGGCTGTGTGCCTTGCGTCAGGG 

HepCla #31 

SSGCPERLASCRRLTDFDQGWGPI SYANGS 
AGCTCCGGCTGTCCCGAAAGGCrCGCCTCCTGCAGAAGGCTCACCGATTTCGATCAGGGATGGG^ 

HepCla #178 

RTEEAI YOCCDLDPOARVAI KS LTERLYVG 
AGGACAGAGGAAGCCATTTACCAATGCTGTGACCTCGACCCTCAGGCTAGGGTCGCCATTAAGTCCCTGAC^ 

HepCla #69 

VSKGWRLLAPITAYAOQTRGLLGCI ITSLT 
GTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACAGCCTATGCCCAACAGACAAGGGGACTGCTC 

HepCla #191 

FFSVLIARDQLEQALDCEIYGACYS IEPLD 
TTCTTTAGCGTCCTG ATTGCCAG AGACC AACTGGAACAGGCTCTGGATTGCG AAATCTATGG CGCTTG CTATAG CATTGAGCCTCTGGAT 

HepCla #14 2 

CQVPS PEFFTELDGVRLHRFAP PCK PLLRE 
TGCCAAGTGCCTAGCCCTGAGTTTTTCAGAGAGCTCGACGGAGTGAGAC 
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HepCla #182 

TCYIKARAACRAAGLQDCTMLVCGDDLVVI 
ACCTGTTAC ATTAAGGCTAGGGCTG CCTGTAGGGCTGCCGG ACTGCAAG ACTGT AC CATGC TGGTCTGCGGAG ACGATCTGGTCGTGATT 

HepCla #86 

IDPNI RTGVRTITTGSPITYSTYGKFLADG 
ATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCTATAGCACATACGGAAAGTTTCTGGC^ 

HepCla #4 4 

CNWTRGERCDLEDRDRSELSPLLLSTTQWQ 
TGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGAGACAGAAGCGAACTGTCCCCCCTCCTGCTCAGCACAACC 

HepCla #22 

TGHRMAW DMMMNWS PTAALVMAQLLRI PQA 
ACCGGACACAGAATGGCTTGGGATATGATGATGAATTGGTCCCCCACAGCCGCTCTTGGTCATGGCTCAGCTCCTC 

HepCla #127 

PGALVVGVVCAAI LRRHVGPGEGAVQWMNR 
CCCX^AGCCCTCGTGGTCGGCGTCGTGTGTGCCGCTATCCTCAGGAGACAC^ 

HepCla #14 9 

HDS PD'AE L. I EANLLWRQEMGGN I TRVES EN 
CACGATAGCCCTGACGCTGAGCTCATCGAAGCCAATCTGCTCTGGAGACAGGAAATGGGAGGCAATATCACAAGGGTCGA 

HepCla #105 

EGVFTGLTHIDAHFLSQTKQSGENFPYLVA 
GAGGGAGTGTTTACCGGACTGACACACATTGACGCTCAOTTTCTGTCCCAGACAAAGCAAAGCGGAGAG 

HepCla #5 
R G R R 0 



HepCla #173 

LIVFPDLGVRVCEKMALYDVVSKLPLAVMG 
CTGATTGTGTTTCCCGATCTGGGAGTGAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCAAGCT^ 

HepCla #12 

YATGNLPGCSFSI FLLALLSCLTVPASAYQ 
TACGCTACCCK5AAACCTCCCCGGATGCTCCTTCT<XATCTT^ 

HepCla #124 

LGKVLVD I LAGYGAGVAGALVA FKI MSGEV 
CTGGGAAAGGTCCTGGTCGACATTCTGGCTGGCTATGGCGCTGGCGTCGCCGGAGCCCTCGTGGCT 

HepCla #160 

SYSSMPPLEGEPGDPDLSDGSWSTVSSEAG 
AGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCTGGCGATCCCGATCTGTCCGACGGAAGCTGGAGCACAGTGTCCAGCG 

HepCla #150 

RQEMGGN I TRVES ENKVVI LDS FD PLVAEE. 
AGGCAAGAGATGGGCGGAAACATTACCAGAGTGGAAAGCGAAAACAAAGTGGTCATCCTCGACTCCTTCGATC 

HepCla #75 

VGWPAPOGSRSLTPCTCGSSDLYLVTRHAD 
GTGGGATGGCCTGCCCCTCAGGGAAGCAGAAGCCTCACCCCTTGCACATGCGGAAGCTCCGACCTCTACCTC^ 

HepCla #8 8 

GCSGGAYDI I ICDECHSTDATS I LGIGTVL 
GG CTGTAGCGGAGGCGCTTACGATATCATTATCTGTGACGAATGCCATAGCACAGACGCTACCTCCATCCTCGGCATTGGCACAGTGCTC 

HepCla #99 

TFT IETTTLPQDAVSRTQRRGRT<3RGKPGI 
ACCTTTACCATTGAGACAACCACACTGCCTCAGGATGCCGTCAGCAGAACCCAAAGGAGAGGCAGAACCG 

HepCla #4 0 

DCFRKHPEATYSRCGSGPWITPRCLVDYPY 
GACTCTTTCAGAAAGCATCCCGAAGCCACATACTC^ 

HepCla #201 

LAAGVG I YLLPNRAA 
CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCC 
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HepCla #163 

ALVTPCAAEEQKLP I N A L S N S LLRHHNLVY 
GCCCTCGTGACACCCTGTGCCX5CTGAGGAACAGAAACTGCCTATCAATGCCCTCAGCAATA 

HepCla #132 

ISSECTTPCSGSWLRDIWDWI CEVLSDFKT 
ATCTCCAGCGAATGCACAACCCCTTGCTCCGGCTCCTGGCTC^^ 

HepCla #134 

WLKAKLMPQLPGI PFVSCQRGYKGVWRGDG 
TGGCTCAAGGCTAAGCTCATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAGGGAGTGTGG 

HepCla #41 

SGPWIT PRCLVDYPYRLWHYPCTINYT I FK 
AGCGGACCCTGGATCACACCCAGATGOTTGTGGATTACCCTTACAGAC^ 

Artificial Protein: 



VIPVRRRGDSRGSLLSPRPISYXKGSSGGPARRGREILUGPADGMVSKGWRLLAPITAY 

KLITWGADTAACGDI INGLPVSLLCPAGHAVGIFRAAVCTRGVAKAVDFI PVCVVIVGRIVLSGKPAIIPDREVLYREFDEMPCTPLPAPNYTFAIjWR 
VSAEEYVEIRRVGDALYDVVSKLPLAVMGSSYGFQYSPGQRVEFISWCLWWI^YFLTRVE 

CFAWYLLPP I IQRLHGLS AFSLHS YS PGE I NRVAACNPPLVETWKKPDYE P PWHGCPLPPPRS P PG VGSS I ASWAI KWE YWLLFLLLADARVCSLN 
NTRP P IX3NWFGCTWMNSTG FTKVCG AP PFTEAMTR YSA P PGD P PQ P E YDLE LIT S CS S W PLLLLLLALPQRA YALDTE VAA S CGG WLQQTRG LLGC I 
ITSLTGRDKNQVEGEVQIVSSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAN 
AGRTTSGLVS LI^EVTLTHPVTKYI MTCMSADLJE VVTSTWVLWGLM^ 

LDLSIAYFSMVGNWAKVLVVLLLFAGVDAETHVTRI^AJ^GSPPSMASSSASQLSAPSLKATCTA 

EPEPDVAVLTSMLTDPSHITAEAAGRDSVTPIDTTIMAKNEVFCVQPEKGGRKPARYAAQGYTCVLV^ 

DCPNSSIVYEAADAILHTSSYGFQYSPGQRVEFLVQAWKSKKTPMGFSDTAACG 

AELI EANLLWNPAI ASLMAFTAAVTSPLTTSQTLLFNI IiGLVOAWKSKKTP^FSYDTRCFT)STVTESDIDEREISVPAEILRKSRRFA0ALPVWARP 
DYMFAPTLWARMI LMTHFFSVLI ARDQLEQALSVI PTSGDVVWATDAXJ^GYTGDFDSVIDCTHSKKKCDELAAKLVAl/SINAVAYYRGIXJVVLPCSF 
TTLPALSTGLI HLHQNI VDVQYLYKGRWPGAVYALYGMWPLLLLLLALPQRAYS PI TYSTYGKFLADGGCSGGAYDI IICDECARSVRARLLARGGR 
AAICGKYLFNWAVRTKKAVAHINSVWKDLLEDSVTPIIJTTIKAKNEFTPSPVVVGTTDRSGAPTYSW 
VATRDGKLQDCTMLVCGDDLVVI CESAGVQEDAASlJtA VAGAL VAFKIMSGEVPSTEDLVNLL PAIL 

ELT P AETTVRLRA YMNT PG LPVCQDHLE FW PQ PEYDLELI TS CS S NVS VAHDG AG KR VYYLG KV I DTLTCG F ADLMGY I PLVGAPLGGAAAI PLEVI K 
GGRHLI FCHSKKKCDEiiAAKLVGGVLAALAAYCLSTGCWI VGR I VI^GKPACESAGVQEDAASLRAFTEAMTRYSAPPGDPGWFTAGYSGGDIYHSV 
SHAR PRWFWFCLLLSSSTSG I TGDNTTTSSE PAPSGCP PDSDAERTQRRG RTGRGK PG I YR FVAPGER PSGMFD VRMYVGGVEHRLEAACNWTRGERC 
DLEDRDEAQLHVWVppIjNVRGGRDAVILLMCVVHPTLGVRATRKTSERSQPRGRRQPI PKARRPEGNVSVAHDGAGKRVYYLTRDPTTPLARAAWESE 
PA PSGCP PDSDAES YSSMPPLEGEPGD PI GGHYVQMA 1 1 KLGALTGTYVYNHLT PLRD PSTEDLVNLLPA I LS PGALWGWCAAILRILDMIAGAHW 
GVLAG I AYFSMVGNWAKVLVEGCGWAGWLLS PRGSR PSWG PTDPRRRSRNWTTQGCNCS I Y PGH I TGHRMAVTDMMMNWSPWVAMTPTVATRDGKL.PAT 
QLRRHI DLLVGSRLWHYPCTI NYTI FKVRMYVGGVEHRLEAAVFCVQPEKGGRKPARLI VFPDLGVRVCEKMKGYI PLVGAPLGGAARALAHGVRVLE 
DG VNGGNAGRTTSGLVS LLT PG AKQN I QL I NTNGLA LLSCLTVP AS A YQ VRN STG LYHVTNDC PG RD KNQ VEGEVQ I VS TAAQT F LATC I NG VCP ATQ 
LRRHIDI^VGSATI^SALYVGDIiCGSHAPTGSGKSTKVPAAYAAC^YKVLVL^ 

WTGALVTPCAAEEQKLP I ALDTEVAASCGG WLVGLMALTLS P YYKRYWMNSTG FTKVCGAP PCV I GGAGNNTLHCPTSVEEACSLTPPHSAKSKFGY 
G AKD VRCHAR I SG I Q YLAGLSTL PGN PA I AS LMAFT AA VTQ I VGG VYLLPR RG PRLG VRATR KTS ERSQP LHS YS PGE I NRVAACLRK LG VP P LRAWR 
HRTARHTP VNSWLGNI I MFA PTLWARMI LMTH ENLETTMRS PVFTDNS S P PA VPQS FQVAHLATPPGSVTVPHPNIEEVAI^TTGEI PFYGKLVFDIT 
KLLLAVFG PLWI LQASLLKV PY FVTAALVMAQLLR I PQAI LDMI AGAHWGVLAGCNTCVTQTVDFSLDPTFTI ETTTLPQDAVSHGPTPLLYRLGAVQ 
NEVTLTHPWKYIOTCARVAIKSLTERLYVGGPLTNSRGENC^^ 

GDSRGSLlJnWSGTFPINAYTTGPCTPLPAPNYTFALWHSTDATS I LG IG1VUX)AETAGARLWIATYVPESDAAARVTAI I^SLTVTQLLRRLHQW 
RPSWGPTDPRJUtSRNLGKVIDTLTCGFADUSPDQRPYCWHYPPKPCGIVPAKSVCGPVYCEECSQHLPYIE^ 
AQAPPPSWDQMWKCLIRLKPTLCGIVPAKSVCGPVYCFTPSPVWGTTDRSGSSLTVTQL^^ 
EDWCCSMSYSWTGWDQ!WKCLIRLKPTLHGPTPLLYRLGAVQNLAEQFKQra 

LSTLPGLIAFASRGNHVSPTHYVPESDAAARVTAILATLCSALYVGDLCGSVFLVGQLFTFSPRRHSSVIXTEC^ 
VAAQLAAPGAATAFVGAGLAGAAIGSVGSWHINSTALNCTNESLNTGWLAGLFY 
RKTKRNTWRPQDVKFPGGGSQTKQSGENFPYLVAYQATVCARAQAPPPSAPTY'SWGANDTO^ 
TALAEIATKSFGSTTSRSACQRQKTCWFDRLQVl^SHYQDVLDQAETAGARLWIAT^^ 

LDGVLKLT PI AAAGRLDLSGWFTAGYSGGDI YHSASRQAEVI A PA VQTNWQKLEVFWAKHMWN FCRASG VLTTSCGNTLTCYI KARAACRAAGLFDRL 
QVLDSHYQDVLKEVKAAASKVKANLLGPLTNSRGENCGYRRCRASGVLTTSC^ 

VGSQLPCEPEPDVAVLTSKEVKAAASKVKANLLSVEEACSLTPPHSAKGRDAVILLWCVVH PTLVFDITKLLLAVFGPMLTDPSHITAEAAGRRLiARG 
SPPSMASSSASPRPISYLKGSSGGPLLCPAGHAVGIFRAADFDGCWGPISYANGSGPDQRPYCWHYPPKPRHVGPGEG 

THCLWMMLLI SQAEAALENLVI LNAASLAGTHI I PDREVLYREFDEMEECSQHLPYI EQGMMLI HLHQNI VDVQYLYG VGSS I ASWAI KWEYVSHARP 
RWFWFCLLLLAAGVGI YLLPNRAAAATLGFGA YMS KAHGI DPN I RTGVRT I TTGRVQGLLR I CALARKMIGGH YVQMAI IKLGARRFAQALPVWARPD 
YNP PLVETWKKPDYEPTAAQTFLATC I NGVCWTVYHGAGTRT I AS PWAHNGLRDLA VAVEPWFSQMETKLI TWGAKGPVI QMYTNVDQDLVGWPAPQ 
GSRSLTPCKWI LDSFDPLVAEEDEREI SVPAEI LRKSLTGTYVYNHLTPLRDWAHNGLRDLAVAVEPVCTRGVAKAVDFI PVENLETTMRSPVFTDN 
AIX3INAVAYYT?GLDVSVIPTSGDVVVVATDMSADLEVVTSTWVLVGGVLAALAAYCXSTG 

V KFPGGGQ I VGG VYLLPRRG PRRALAHG VR VLEDG VNY ATGN LPGC S FS I F LS K FGYG AKD VRCHAR KAV AH I N S VWKD LLET PG AKQN I QL I NTNG S 
WHINSTALNCNESLNTGWIJVGLFYQHKFNSSGCPERIASCRRLTV^ IQRLH 
GLS AF S WT VYHG AGTRT I AS P KG PV I QMYTNVDQDLYR FVA PGER P SGMFDS S VLCECYD AGCA WY R S ELS P LLLSTTQWQVL PCS FTTL P ALSTGLR 
KLG VP PLRAWRHRARS VRAR LLARGG RAS PLTTSQTLLFNI LGGWVAAQLAAPGAATALWI LQASLLKVPYFVRVCCLLRICALARKMVKNGTMR I VG 
PRTCRNMWSGTFPINAYTTGEVALSTTGEI PFYGKAI PLEVI KGGRHLI FLTRDPTTPLARAAWETARHTPVNSWLGNI IRVSAEEYVEIRRVGDFHY 
VTGMTTDNLKCP PWHGCPLP PPRS P PVPPPRKKRTWLTE STLSTALAELATKS FGS SSTSG I TGDNTTTSVSCQRGYKGVWRGDGIMHTRCHCGAE 
I TGHVFLVGQLFTFS PRRHWTTQGCNCS I YPGH I FVGAGLAGAAI GS VGLGKVLVDI LAGYGAGDI WDWI CEVLSDFKTWLKAKLMPQLPGI PFNSSI 
VY E AADA I LHT PGCV PCVR EGNAS RCS SGC PE RLAS C RR LTD FDQGWG PIS YANG S RT E EA I YQCCDLD PQ AR VA IKS LTE RLYVGVS KGWR LLA P I T 
AYAQQTRGLLGCI ITSLTFFSVLIARDQLEQALDCEIYGACYSIEPLDCQVPSPEFFTELDGVRLHRFAPPCKPLLRETCYI KARAACRAAGLQDCTM 
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L VCGDDL W 1 1 D PN I RTG VRT I TTGS P I TY ST YG KFLADGCNWTRG ERCDLED RDRS ELS PLLLS TTQWQTGH RMAWDMMMNWS PT AAL VMAQL LR I P 
QAPGAL.WGWCAAILRRHVG PGEGAVQVWNRHDSPDAELIEANLLWROEMGGNITRVESENEGVFTGLTHIDAHFLSQTKQSGENFPYLVARGRRQP 
I PKARRPEGRTWAQPGYPWPLYGNLI VFPDLGVRVCEKMALYDWSKLPLAVMGYATGNLPGCSFS I FL1ALL5CLTVPASAYQLGKVLVDILAGYGA 
GVAGALVAFKIMSGEVSYSSMPPLEGEPGDPDLSDGSWSTVSSEAGRQEMGGNITRVESENKW 

YLVTRHADGCSGGAYDI I ICDECHSTDATSXLG IGTVLTFTI ETTTLPQDAVSRTQRRGRTGRGKPGI DCFRKH PEATYSRCGSGPWITPRCLVDYPY 
LAAGVGI YLLPNRAAALVT PCAAEEQKLPI NALSNS LLRHHNLVYI SSECTTPCSGSWLRDI WDW I CEVLSDFKTWLKAKLMPQLPGIPFVSCQRGYK 
G VWRGEX3SG PWI TPRCLVDY P YRLWH Y PCTINYT I FK 



Artificial DNA: 

GTGATTCCCGTCAGGAGAAGGGGAGACTCCAGGGGAAGCCTCCTGTCCC^ 

GGGAAGGGAAATCCTCCrGGGACCCGCTGACGGAATGGTCAGCAAAGGCTGGAGGCTCCTGGCT 

CTCCCCCTTGCAAACCCCTCCTGAGAGAGGAAGTGTCCTTCAGAGTGGGACTCCATGAGTATCCCGTCGGCTCCGT^ 
AAGCTC^TCAC^TGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCCCTC 
CTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTCTGCGTC^ 
CTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAATTCGATGAGATC 

GTGTCCGCCGAAGAGTATGTGGAAATCAGAAGGGTCGGCGATGCCCTCTACGATGTGGTCAGCAAACTGCCTCTC 

gggtgcctcccctcaacgtcaggggagagaatctggtcatcctcaacgctgcctccctggctggcacacacggact^ 

tgctttgcctggtacctcctgcctcccattatccaaaggctccacggactgtccgcctttagcctccact 

ggctgccrgtaaccctcccctcgtggaaacctggaagaaacccgattacgaaccccc 

ctggcgtcggctccagcattgcctcctgggctatcaaatgggaatacgtcgtgctcctgtttctgctcct 

aacacaaggcctcccctcx;gcaattggtttggctgtacctggatgaatagcacaggctttaccaaagtgtgtggcgct 



TCGCCCTCCCCCAAAGGGCTTACGCTCTGGATACCGAAGTGGCTGCCTCCTGCGGAGGCGTCGTGCTCCAGCAAA 
ATCACAAGCCTCACCGGAAGGGATAAGAATCAGGTCGAGGGAGAGGTCCAGATTGTGTCCAGCTCCCCCCCTGCCG 

AAGGCGTCTTCACAGGCCTCACCGATATGGATGCCCATTTCCTCGTGCTCCTGCTCTTCGCTGG 
GCCXXJAAGGACAACCTCCGGCCTCGTGTCCCTGCTCGAGGTCACCCTCA 

GGTCGTGACAAGCACATGGGTCCIX^TCGTGGGACTGATGGCCCTCACCCTCAGCCCTTACTATAAGA^ 

AATACTTTCTGAC AAGGGTCGCCATTTG CGGAAAGTATCTGTTT AACTGGGCCGTCAGG ACAAAGCTC AAGCTCACCCCTATCGCTGGCG CTGGCAG A 
CTGGATCTGTCX^TCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCTCGTGGTCCTGCT 

GACAAGGCTCGCCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCCGCCTCCCAGCTCAGCGCTCCCTCCCTGAAAGCCAC^ 
TCGTGTCCTTCCTCGTGTTTTTCTGTTTCGCTTGGTATCTGAAAGGCAGATGGGTCCCCGGAGCCGTCTACGCT 

GAGCCTGAGCCTGACGTCGCCGTCCTGACAAGCATGCTGACAGACCCTAGCCATATCACAGCCGAAGC<X3CTGGCAGAGACTCCX3TGACACCCAT 

CACAACCATTATGGCTAAGAATGAGGTCTTCTGTGTGCAACCCGAAAAGGGAGGCAGAAAGCCTGCGAGATACGCTGCCCA^ 

TCCTGAATCCCTCCGTGGCTGCGACACTGGGATTCGGAGCCTATATGTCCAAGGCTCACGGAGTGAG^ 

GACTGTCCCAATAGCTCCATCGTCTACGAAGCCX5CTGACGCTATCCTCCACACAAGCTCCTACGGATTCCAATACTCCCCG 

CCTCGTGCAAGCCTGGAAGTCCAAGAAAACCCCTATGGGATTCTCCGACACAGCCGCITGCGGA 

GAGGCAGAGAGATTCTGCTCGGCCCTGCCGATGGCATGAGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTACCGCT 
GCCGAACTGATTGAGGCTAACCTCCTGTGGAACCCTCCCATTGCCTCCCT 

CCTCCTGTTTAACATTCTGGGACTGGTCCAGGCTTGGAAAAGCAAAAAGACACCCATGGGCTTTAGCTATGACACAA 

CAGAGTCCGACATTGACGAAAGGGAAATCTCCGTGCCTGCCGAAATCCTCAGGAAAAGCAGAAGGTTTGGCCAAGCCCTC 

GACTATATGTTTGCCCCTACCCTCTGGGCTAGGATGATCCTCATGACACACTTTTTCTCCGTGCT 

CGTCATCCCTACCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACACAGGCGATCT 

AAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTATTACAGAGGCCTCGACGT 

ACCACACTGCCTGCCCTCAGCyVCAGGCCTCATCCATCTGCATCAGAATATCGTCGACGTCCAGTATCTGTATAAGGGAAGGTGG 

GTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTGCTCCTGGCTCTGCCTCAGAGAGCCTATAGCCCT 

TCGCCGATGGCGGATGCTCCGGCGGAGCCTATGACATTATCATTTGCGATGAGTGTGGCAGAAGCGT^ 

GCCGCTATCTGTGGCAAATACCTCTTCAATTGGGCTGTGAGAACCAAAAAGGCTGTGGCTCACATTAACTC 

CCTCACCCCTATCGATACCACAATCATGGCCAAAAACGAATTCACACCCTCCCCCGTCGTGGTCGGCACAACCGATAG^ 

CCTGGGGAGCCAATGACACAGACX5TCTTCGTCCCCGGATGCGTCCCCTGTGTGAGAGAGGGAAA 

GTGGCTACCAGAGACGGAAAGCTCCAGGATTGCACAATGCTGGTGTGTGGGGATGACCTCGTGGTCATCTGTGAGTC 

CGCTAGCCTCAGGGCTGTGG CTGGCG CTCTGGT CGCCTTTAAG ATT ATGTCCGGCG AAGTGCCTAG CACAG AGGATCTGGTCAACCTCCTGCCTGCC A 

TTCTGTCCTACGATACCAGATGCTTTGACTCCACCGTCACCGAAAGCGATATCAGAACCGAAGAGGCTATCTATC^ 

GAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCCTGGCCT^ 

CGAATACGATCTGGAACTGATTACCTCCTGCTCCAGCAATGTGTCCGTGGCTCACGATCGC 

ATACCCTCACCTGTGGCTTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCG 

GGCGGAAGGCATCTGATTTTCrrGTCACrCCAAGAAAAAGTGTGACGAACTGGCTGCCAAACTGGTCGG 

CCTCAGCACAGGCTGTGTGGTCATCGTCGGCAGAATCGTCCTGTCCGGCAAACCCGCTTGCGA^ 

AGGCATGCCAGACCCAGATGGTTTTGGTTTTGCCrCCTC 

•CCCTAGCGGATGCCCTCCCGATAGCGATGCCGAAAGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATGT^ 

GAGAGAGACCCTCCGGCATGTTCGATGTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCTCGAGGCTCCCTGTAACT^ 

GACCTCXSAGGATAGGGATGAGGCTCAGCTCCACGTCTGGGTCCCCCCTCTGAATGTGAGAGGCGGAAGGGATGCCGTCA 

GCATCCCACACTGGGAGTGAGAGCCACAAGGAAAACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGCAACCCATTCCCAAAGC 

GAAACGTCAGCGTCGCCCATGACGGAGCCGGAAAGAGAGTGTATTACCTCACCAGAGACCCTACCACACCCCTCGCCAGAGCC 

CCCGCTGCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTGGAAGGCGAACGCGGAGACCCTATCGGAGG 

CGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATCTGACACCCCTC^GGGATCCCTCCACCGAAGACCT 



GGCGTCCTGGCTGGCATTGCCTATTTCTCCATGGTCGGCAATTGGGCTAAGGTCCTGGTCGAGGGATGC 

GGGAAGCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTGCAGGAATTGGACAACCCAAGGCTGTAACTGTAGC 

CAGGCCATAGGATGGCCTGGGACATGATGATGAACTGGAGCCCTTGGGTCGCCATGACCCCTACCGTCGCCACAAGGG 
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CAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCCAGGCTCTGGCATTACCCTTGCACAATCAATTACACAAT 

CGGAGTGGAACACAGACTX3GAAGCCGCTGTGTTTTGCGTCCAG 

TCAGGGTCTGCGAAAAGATGATGGGATACATTCCCCTCGTGGGAGCCC 

GACGGAGTGAATGGCGGAAACGCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCCAAACAGAATATCC^ 
CGGACTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTCCGCCTATCAGGTCAGGAATAGCACAG^ 
GAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATCGTCAGCACAGCCGCTCAGAC^ 
CTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCT^ 

CAAAAGCACAAAGGTCCCCGCTGCCTATGCCGCTCAGGGATACAAAGTGCTCGTGCTCAAC 

GGCCCCTCTACGGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCTGTCCCCCAGAGGCTCCACCGAAGACGTCC 

TGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATTGCCCTCGACA 

GGTCGGCCTCATGGCTCTGACACTGTCCCCCTATTACAAAAGGTATTGGATGAACTCCACCGGATT 

TTGGCGGAGCOSGAAACAATACCCrrCCACTGTCCCACAAGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGC^ 

GGCGCTAAGGATGTGAGATGCCATGCCAGAATCTCCGGCATTCAGTATCT 

GGCTTTCAC AGCCG CTGTG ACACAGATTGTGGG AGGCGTCT ACCTCCTGCCT AGG AGAGG CCCT AGGCTCGG CGTCAGGGCTACCAG AAAGAC AAGCG 

AAAGGTCCCAGCCTCTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTCAGGAAACTGGGA 

CACAGAACCGCTAGGCATACCCCTGTGAATAGCTGGCTGGGAAACATTATCATGTTCG 

GAATCTGGAAACC^CAATGAGAAGCCCTGTGTTTACCGATAACTCCAGCCCTCCCGCTGTGCCTCAG 

CTGGCTCCGTGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCTGTCCACCACAGGCGAAATCCC^ 

AAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCTGCT 

GCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCC 

CCCTGGATCCCACATTCACAATOSAAACCAC^CCCTCCCCCAAGACGCT^ 

AACGAAGTGACACTGACACACCCTGTGACAAAGTATATCATGACCTGTGCCAGAGTGGCTATCAAAAGCCTCAC 

CCTCACCAATAGCAG AGGCG AAAACTGTGGCTAT AGGAGATGCGTCATCGG AGG CGCTGG CAAT AACACACTGCATTGCCCTACCG ATTG CTTT AGG A 
AACACCCTGAGGCTACCTATAGCAGATGCGGAACCTGTGGCTCCAGCGATCTGTATCTGGTCA 
GGCGATAGCAGAGGCTCCCTGCTCAACATGTGGTCCGGCACATTCCCTATCAATGCCTATACCACAG 
CACATTrcCTCTGTGGCACTCCACCGATGCCACAAGCATTCTGGGAATCGGAACCCTCC^^ 

TCGCCACATACGTCCCCGAAAGCG ATG CCGCTGCCAG AGTGAC AG CCATTCTGTCCAG CCTCACCGTCACCCAACTGCTCAGG AG ACTGCATC AGTGG 
AGGCCTAGCTGGGGCCCTACCGATCCCAGAAGGAGAAGCAGAAACCTCGGCAAAG 

CCAAAGGCCTTACTGTTGGCATTACCCTCCCAAACCCTGTGG CATTGTGCCTGCCAAAAGCX3TCTGCGG ACCCGTCTACTGTG AGGAATGCTCCC AG C 
ATCTGCCTTACATTGAGCAAGGCATGATCCTCGCCGAACAGTTTAAGCAAAAGGCT 

GCCCAAGCCCCTCCCCCTAGCTGGGACCAAATGTGGAAGTGTCTGATTAGGCTCAAGCCTACCCTCTGCGGAATCGTCC 
CCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACA 

AATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGGCTCAGAGACCTCAGCGATGGCTCCTGGTCCACCGTCA 

G AGG ATGTGGTCTGCTGT AG CATG AG CTAT AG CTGG ACCGG ATGGG ATC AG ATGTGGAAATG CCTCATCAG ACTG AAACCC AC ACTGCATGGCCCTAC 
CCCTCTGCTCTACAGACTGGGAGCCGTCCAGAATCTGGCTGAGCAAT^ 

TCATCGCTCCCGCTGTGCAAACCAATTGGCAAAAGCTCXSAGGTCTTCTGGGCCAAACACATGTGG 
CTGTCCACCCTCCCCGGACTGATTGCCTTTGCCTCCAGGGGAAACCATGTGTCCCCCACACACTA 
CGCTATCCTCGCCACACTGTGTAGCGCTCTGTATGTGGGAGACCT 

ATAGCTCCGTGCTCTGCGAATGCTATGACGCTGGCTGTGCCTGGTACGAACTGACACCCGCTGAGA 
GTGGCTGCCCAACTGGCTGCCCCTGGCGCTGCCACAGCCTTTGTC 

CTCCACCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACTGTTTTACCAACACAAATTCAATAACGCTCTG 

TCAGGCATCACAATCTGGTCTACTCCACCACAAGCAGAAGCGCTTGCCAAAGGCAAAAGAAAGTGACAGCCGCTATGTCCACC 

AGGAAAACCAAAAGGAATACCAATAGGAGACCCCAAGACGTCAAGTTTCCCGGAGGCGGAAGCCAAACCAAACAGTCCGGTC 

TCAACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACAGTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACCGAA 
ACCGCTCTGGCTGAGCTCGCCACAAAGTCCTTCGGAAGCACAACCT 

GCTCGACTCCCACTATCAGGATGTGCTCGACCAAGCCGAAACCGCTGGCGCTAGGCTCGTGGT 

TCCCCCATCCCAATATCGAATTCCATTACGTCACCGGAATGACAACCGATAACCTC^ 

CTGGATGGCGTCCTGAAACTGACACCCATTGCCGCTGCCGGAAGGCT 

CTCCGCCTCCAGGCAAGCCGAAGTGATTGCCCCTGCCXrrCCAGACAAACTGGCAGAAACTGGAAGTC 

GAGCCTCCGGCGTCCTG ACAACCTCCTGCGG AAACACACTG ACATGCTATATCAAAG CCAG AGCCtSCT^ CAGAGCCG CTGG CCTCTTCGAT AGGCTC 
CAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAGAGGTCAAGGCTGCCGCTAGCAAAGTGA 
GGGAGAGAATTGCGGATACAGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAG 
CTGAGATTACCGGACACCTCAAGAATGGCACAATGAGAATCGTCGG 

GTGGGAAGCCAACTGCCTTG CG AACCCGAACCCGATGTGGCTCTGCTCACCTCCAAGG AAGTGAAAGCCGCTGCCTC CAAGGTCAAGGCT AACCTCCT 
TTGACATTACCAAACTGCTCCTGGCTGTGTTTGGCCCT 

AGCCCTCCCTCCATGGCTAGCTCCAGCGCTAGCCCTAGGCCTATCTCCrACCTCAAGGGAA 
CGTCGGCATTTTCAGAGCCGCTGACITTGACCAAGGCTGGGGCCCTATCTCCTACGCTAACGG 

ATCCCCCTAAGCCTAGGCATGTGGGACCCGGAGAGGGAGCCGTCCAGTGGATGAATAGGCTCATCGCTTTCGCTAGCAGAG 

ACCCATTGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCT 

CATTCCCGATAGGGAAGTGCTCTACAGAGAGTTTGACGAAATGGAAGAGTGT^ 

TCCACCAAAACATTGTGGATGTGCAATACCTCTACGGAGTGGGAAGCTCCATCGCTAGCT 

AGGTGGTTCTGGTTCTGTCTGCTCCTGCTCGCCGCTGGCGTCGGCATT^ 

CATGAGCAAAGCCCATGGCATTG>ACCCTAACATTAGGACAGGCGTCAGGACAATCACAACCGGAAGGGTCCAG 

CTAGGAAAATGATTGGCGGACACTATGTGCAAATGGCTATCATTAAGCT 

TACAATCCCCCTCTGGTCGAGACATGGAAAAAGCCTGACTATGAGCCTACCGCT 

CGTCTACCATGGCGCTGGCACAAGGACAATCGCTAGCCCTTGGGCTCACAATGGCCT 

AAATGGAAACCAAACTGATTACCTGGGGCGCTAAGGGACCCGTCATCCAAATGTATACCAATGTG 

GGCTCCAGGTCCCTGACACCCTGTAAGGTCGTGATTCTGGATAGCTTTGACCC^ 

GATTCTGAGAAAGTCCCTGACAGGCACATACGTCTACAATCACCTCACCCCT 

TCGAGCCTCTGTGTACCAGAGGCGTCGCCAAAGCCGTCGACTTTATCCCTGT 

GCCCTCX5GCATTAACGCTGTGGCTTACTATAGGGGACTGGATGTGTCCGTGATTCCCACAAG 
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CGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTC 

ATACCGGAGACTTTGACTCCGTGATTGACTGTAACACATGCGTCACCCAAACCGTCGACTT^ 

G TGAAATTCCCTGG CGG AGGCCAAATCGTCGGCGG AGTGTATCTGCTCCCCAG AAGGGGACC CAGAAGGG CTCTGGCTC ACGG AGTGAGAGTGCTCGA 

GGATWCGTCAACTATGCCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTCAGCAAATTCGGATACG^ 

CTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGAAAGACCTCCTGGAAACCCCTGGCGCTAAGCAAAACATTCAGCT 

TGGCATATCAATAGCACAGCCCTCAACTGTAACGAAAGCCTCAACACAGGCTGGCTGGCTGGCCTC^ 

CCCTCAGAGACTGGCTAGCTGTAGGAGACTGACAGTGGTCCTGCT 

TCATCTCCCAGGCTGAGGCTGCCCTCGACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGAC 

ggcctcagcgcrrtctcctggacagtgtatcacggagccggaaccagaaccattgcctcccccaaaggc 

ccaagacctctacagattcgtcgcccctggcgaaaggcctagcc^aatgtttgactccagc^^ 

ataggtccgagctcagccctctgctcctgtccaccacacagtggcaggtcctgccttgct 

aagctcggcgtcccccctctgagagcctggaggcatagggctaggtccgtgagagcca 

ctccgagac^ctgctcttcaatatcctcggcggatgggtcgccgctcagctcgccgctcccggagccgctaccgct 

tcctgaaagtgccttactttgtgagagtgcaaggcctcctgagaatctgtgccctcgccagaaagatggtgaaaaacggaac 

cccagaacctgtaggaatatgtggagcggaacctttcccattaacgcttacacaaccggagaggtcgccctcag 

cggaaaggctatccctctggaagtgattaagggaggcagacacctcatctttctgacaagggatcccacaacccc^ 

cagccagacacacacccgtcaactcctggctcggcaatatcattagggt 

gaaaaggacagtggtcctgacagagtccaccctcagcacagccctcgccgaactggctaccaaaagc^ 

gagacaataccacaacctccgtgtcctgccaaaggggatacaaaggcgtctggagaggcgatggcatt 

atcacaggccatgtgtttctggtcggccaactgtttacctttagccctaggagacactg^ 

cattttcgtcggcgctggcctcgccggagccgctatcggaagcgtcggcctcggcaaagtgctcgtggatatcct 

tttgggattggatttgcgaagtgctcagcgatttcaaaacctggctc 

gtgt atg aggctg ccg atgccattc tgc atac ccctgg ctgtgtgccttgcgtcaggg aaggc aatgcctccaggtg tag ctccggctgtcccgaaag 

gctcgcctccigcagaaggctcaccgatttcgatcagggatggggacccattagctatc 

gtgacctcgaccctcaggctagggtcgccattaagtccctgacagagacactgtatgtgggagtgtccaagggatc 

gcctatgcccaacagacaaggggactgctcgg<:tgtatcattacctccctgacattctttagcgtcctgat^ 

ggattgcgaaatctatggcgcttgctatagcattgagcctctggattgccaagtgcctagccctc 

ataggtttgcccctccctgtaagcctctgctcagggaaacctgttagattaaggctagggct 

ctggtctgcggagacgatctggtcgtgattatcgatcccaatatcagaaccggagtgagaaccatta 

cggaaagtttctggctgacggatgcaattggacaaggggagagagatgcgatctggaagacag 

caacccaatggcaaaccggacacagaatggcttgggatatgatgatgaattggtcccccacagccgctcrc 

cgatagccctgacgctgagctcatcgaagccaatctgctctggagacaggaaatgggaggcaat^ 
ttaccggactgacacacattgacgctcactttctgtcccag 

atccctaaggctaggagacccgaaggcagaacctgggcccaacccggatacccttggcctctgtatggcaatctgattgtgtttcccgatct 

GAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGTCCAAGCTCCCCCr^^ 
CCATCTTTCTGCTCGCCCTCCTGTCCTGCCTGACCGTCC^ 

GGCGTCGCCGGAGCCCTCGTG^CTTTTCJ^AATCATGAGCGGAGAGGTCAGCTATAGCTCCATGCCTCCCC^ 

GTCCG ACGG AAGCTGGAGCACAGTGTCCAGCG AAG CCGGAAGGC AAGAG ATGGGCGG AAAC AT T ACC AG AGTGG AAAGCG AAAACAAAGTGGTCATCC 

T ACCTCGTG ACAAGGC ATGCCG ATGGCTGTAG CGG AGGCGCTTACG AT ATCATTATCTG TG ACG AATGCC ATAGCAC AG A CGCTACCTCCATCCTCGG 

CATTGGCACAGTGCTCACCTTTACCATTGAGACAACCACACTGCCTCAG^ATGCCGTCAGCAGAACC<IAAA 

CIXX3CATTGACTGTTTCAGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCGGCCCTTGGATT 

CTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCCGCCCT 

CAGCAATAGCCTCCTGAGACACCATAACCTCX3TGTATATCTCCAGCGAATGCACAACCCCTTC 

TCTXSTGAGGTCCTGTCCGACTTTAAGACATGGCTCAAGGCTAA^ 

GGAGTGTGGAGGGGAGACX^AAGCGGACCCTGGATCACACCCAGATGCCTCGTGGATTACCCTTACAGACTGTGGCACT 
TACCATTTTCAAA 



HepC Savine Cassette Sequences (A+B+C) with specific restriction sites removed which can be joined 
Co generate a single expressible open reading frame that encodes the hepc Savine protein above 



Cassette A 

ggcggatCC c ca cc ATGGIX^TTCCCXSTC AGG AGAAGGGG AG ACTCCAGK5GGAAGCCTCCTGTCCCCCAGACCCATT AGC 
TATCTGAAAGGCTCCAGCGGAGGCCCTGCCAGAAGGGGAAGGGAAATCCTCCTGGGACCCGCTGACGGAATGGTCAGCAA 
AGGCTGGAGGCTCCTGGCTCCCATTACCGCTTACGCTAGGCTCCACAGATTCGCTCCCCCTTGCAAACCCCTCCTGAGAG 
AGGAAGTGTCCrTCAGAGTGGGACTGCATGAGTATCCCGTCGGCT^ 

ACATGGGGAGCCGATACCGCTGCCTGTGGCGATATCATTAACGGACTGCCTGTGTCCCTGCTCTGCCCTGGCGGACACG 

TGTGGGAATCTTTAGGGCTGCCGTCTGCACAAGGGGAGTGGCTAAGGCTGTGGATTTCATTCCCGTCTGCGTCGTGATTG 

TGGGAAGGATTGTGCTCAGCGGAAAGCCTGCCATTATCCCTGACAGAGAGGTCCTGTATAGGGAgTTtGATGAGATGCCC 

TGTACCCCTCTGCCTGCCCCTAACTATACCTTTGCCCTCrGGAGAGTGTCCGCCGAAGAGTATGTGGAAA 

OSGCGATGCCCTCTACGATGTGGTCAGCAAACrGCCTCTGGCTGTGATGGGCTCC^^ 

GCCAAAGGGTCGAGTTTATCTCCTGGTGTCTGTGGTGGCTCCAGTATTTCCTCACCAGAGTGGAAG 

TGGGTGCCTCCCCTCAACGTCAGGGGAGAGAATCTGGTCATCCTCAACGCTGCCTCCCTGGCT 

CAGCTTTCTGGTCITCTTTTGCTTTGC 

GCCTCCACTCCTACTCCCCOSGAGAGATTAACAGAGTGGCTGCCTCTAACCCTCCCCTCG 
GATTACGAACCCCCTTGTGGTCCACGGATGCCCTCTCC^ 
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CTGGGCTATCAAATGGGAATACGTCGTGCTCCTGTTTCT 

GG CCTCCCCT CGGCAATTGGTTTGGCTGT ACCTGG ATG AAT AG CAC AGGCTTTACCAAAGTGTGTGG CGCTCCCCCTTT C 
ACJVGAGGCTATGACAAGGTATAGCGCTCCCCCTGGCGATCCCCCTCAGCCTGAGTATGACCTCGAGCTCATCACAAGCTC 
TAGCTCCTGGCCTCTGCTCCTGCTCCTGCTCGCCCTCCCCCAAAGGGCTTACG 

GCGGAGGCGTCGTGCTCCAGCAAACCAGAGGCCTCCTGGGATGCATTATCACAAGCCTCACCXXSAAG^GATAAGAATCAG 
GTCGAGGGAGAGGTCCAGATTGTGTCCAGCTCCCCCCCTGCCGTCCCCCAAAGCTTTCAGGTCGCCCATCTGCATGCCCC 
TACCGGAAGCGGAAAGTCCACCAAAGTGCCTGCCGCTAAGAGACCCGGACTGCCTGTGTGTCAGGATCACCTCGAGTT 
GGGAAGGCXSTCTTCACAGGCCTCACCCATATCGATGCCCATTTCCT^ 

ACACACGTCACCGGAGGCAATGCCGGAAGGACAACCTCCGGCCTCGTGTCCCTGCTCGAGGTCACCCTCACCCATCCCGT 

CACCAAATACATTATGACATGGATGAGCGCTGACCTCGAGGTCGTGACAAGCACATGGGTGC 

CCCTCACCCTCAGCCGTTACTATAAGAGATACATTAGCTGGTGCCTCTGGTGGCTGCAAT^ 

ATTTGCX5GAAAGTATCTGTTTAACTGGGCCGTCAGGACAAAGCTCAAGCTCACCCCT 

TCTGTCCATCGCTTACTTTAGCATGGTGGGAAACTGGGCCAAAGTGCT 

CCGAAACCCATGTGACAAGGCTCGGCAGAGGCTCCCCCCCTAGCATGGCCTCCAGCTCCGCCTCCCAGCTCTiGCGCTCCC 

TCCCTGAAAGCCACATGCACAGCCAATGG CCTCGTGTCCTTCCT CX3TGTTTTTCTGTTTCGCTTGGTATCTG AAAGGCAG 

ATGGGTCCCCGGAGCCGTCTACGCTCTXSTATGGCATGCAGCTCCCCTGTGAGCCTGAGCCTGACGTCGCCGT 

GCATGCTGACAGACCCTAGCCATATCACAGCCGAAGCCGCTGGC^GAGACrrCCGTGACACCCATTGACACAACCATTATG 

GCTAAGAATGAGGTCTTCTGTGTGCAACCCGAAAAGGGAGGCAGAAAGCCTGCCAG^ 

CCTGGTCCTGAATCCCTCCGTGGCTGCCACACTGGGATTCXXJAGCCrATATC 

CCGGACTGTATCACGTCACCAATGACTGTCCCAATAGCTCCATCGTCTACGAAGCCGCTGACGCTATCCTCCACACAAGC 

TCCTACGGATTCCAATACTCCCCCGGACAGAGAGTGGAgTTtCTCGTGCAAGCCTGGAAGTCCAAGAAAACCCCTATGGG 

ATTCTCCGACACAGCCGCTTGCGGAGACATTATCAATGGCCTCCCCGTCAGCGCTAGGAGAGGCAGAGAGATTCTGCTCG 

GCCCTGCCGATGGCATGAGCCAACTGTCCGCCCCTAGCCTCAAGGCTACCTGTACCGCTAACCATGACT 

GAACTGATTGAGGCTAACCTCCTGTGGAACCCTGCCATTGCCTCCCTGATGGCCTTTAC 

CACCACAAGCCAAACCCTCCTGTTTAACATTCTXSGGACTGGTCC^ 

GCTATGACACAAGGTGTTTCGATAGCACAGTGACAGAGTCCGACATTGACGAAAGGGAAATCTCCGTGCCTGCCGAAA 
CTCAGGAAAAGCAGAAGGTTTGCCCAAGCCCTCCCCGTCTGGGCTAGGCCTGACTATATGTTTGCCCCTACCCTCTGGGC 
TAGGATGATCCTCATGACACACTTTTTCTCCGTGCTCATCGCTAGGGATCAGC^ 
CCTCCGGCGATGTGGTCGTGGTCGCCACAGACGCTCTGATGACCGGATACAC^ 

CATAGCAAAAAGAAATGCGATGAGCTCGCCGCTAAGCTCGTGGCTCTGGGAATCAATGCCGTCGCCTAT^ 
CGACGTCGTGCTCCCCTGTAGCTTTACCJVCACTGCCTGCCCTCAGCACAGGCCTCATCCATCTCCAT 

AtGTCCAGTATCTGTATAAGGGAAGGTGGGTGCCTGGCGCTGTGTATGCCCTCTACGGAATGTGGCCCCTCCTGCTCCTG 

CTCCTGGCTCTGCCTCAGAGAGCCTATAGCCCTATCACATACTCCACCTATGGCAAATTCCTCGCCGATGGCGGATGCTC 

CGGCGGAGCCTATGACATTATCATTTGCGATGAGTGTGCCAGAAGCGTCAGGGCTAGGCTCCTGGCTAGGGGAGGCAGAG 

CCGCTATCTGTGGCAAATACCTCTTCAATTGGGCTGTGAGAACCAAAAAGGCTGTGGCTCACATTAACT 

GATCTGCTCGAGGATAGCGTCACCCCTATCGATACCACAATCATGGCCAAAAACGAgTTt ACACCCTCCCCCGTCGTGGT 

CGGCACAACCGATAGGTCCGGCGCTCCCACATACTCCTGGGGAGCCAATGACACAGACGTCTTCGTCCCCGGATGCGTCC 

CCTGTGTGAGAGAGGGAAACGCTAGCAGATGCTGGGTGGCTATGACACCCACAGTGGCTACCAGAGACGGAAAGCTCCAG 

G ATTG CAC AATGCTCGTGTGTGGCG ATG ACCTCGTGGTCATCTGTG AGTCCGCCGGAG TGCAAGAGG ATG CCGCTAGCCT 

CAGGGCTGTGGCTGGCGCTCTGGTCGCCTTTAAGATTATGTCCGGC^ 

TGCCTG CCATTCTGTCCT ACGATACCAGATGCTTTG ACTCCACCGTCACCGAAAG CG ATATCAGAACCGAAG AGGCT ATC 
TATCAGTGTTGCGATCTcGAcCCCCAAGAGCTCACCCCTGCCGAAACCACAGTGAGACTGAGAGCCTATATGAATACCCC 
TGGCCTCCCCGTCTGCCAAGACCATCTGGAgTTtTGGCCCCAACCCGAATACG 
GCAATGTGTCCGTGGCTCACGATGGCGCTGGCAAAAGGGTCTACTATCTGGGA 

TTTGCCGATCTGATGGGCTATATCCCTCTGGTCGGCGCTCCCCTCGGCGGAGCCGCTGCCATTCCCCTCGAGGTCATCAA 

AGGCGGAAGGC^TCTGATTTTCTGTCACTCCAAGAAAAAGTGTGACGAACTGGCTGCCAAA 

CCGCTCrTGGCTGCCTATTGCCTCAGCACAGGCTGTGTGGTCATCGTCGGCAGAATCGTCCTGT 

GAAAGCGCTGGCGTCCAGGAAGACGCTGCCTCCCTGAGAGCCTTTACCGA^ 

AGACCCTGGCTGGTTCAGAGCCGGATACTCCGGCGGAGACATTTACCA 

GGTTTTGCCTCCTGCTCAGCTCCAGCACAAGCGGAATC^ 

GGATGCCCTCCCGATAGCGATGCCGAAAGGACACAGAGAAGGGGAAGGACAGGCAGAGGCAAACCCGGAATCTATAGGTT 

TGTGGCTCCCC^AGAGAGACCCTCCGGCATGTTCGATGTGAGAATGTATGTGGGAGGCGTCGAGCATAGGCT 

CCTGTAACTGGACCAGAGGCGAAAGGTGTGACCTCGAGGATAGGGATGAGGCTCAGCTCCACGTCT 

AATGTGAGAGGCGGAAGGGATGCCGTCATCCTCCTGATGTGCGTCGTGGATCCCACACrc 

AACCTCCGAGAGAAGCCAACCCAGAGGCAGAAGGCAACCCATTCCCAAAGCCAGA 

CCCATGACGGAGCCGGAAAGAGAGTXTTATTACCTCACC7VGAGACCCTACCACACCCCTCGCCAGAGCCGCTTGGGAAAGC 
GAACCCGCTCCCTCCGGCTGTCCCCCTGACTCCGACGCTGAGTCCTACTCCAGCATGCCCCCTCTGGAAGGCGA^ 
AGACCCTATCGGAGGCCATTACGTCCAGATGGCCATTATCAAACTGGGAGCCCTCACCGGAACCTATGTGTATAACCATC 
TGACACCCCTCAGaGAcCCCTCCACCXaAAGACCTCGTGAATCTGCTCCCC^ 

GGAGTGGTCTGCGCTGCCATTCTGAGAATCCTCGACATGATCGCTGGCGCTCACTGGGGCGTCCTGGCTGGCATTGC 
TTTCTCCATGGTCGGCAATTGGGCTAAGGTCCTGGTCGAGGGATGCGGAra 
GCAGACCCTCCTGGGGACCCACAGACCCTAGGAGAAGGTCCAGGAATg t cga c t gagaa 1 1 cgcc 



Cassette B 



ggcggat ccacca tgc t cgagTGGACAACCCAAGGCTGTAACTGTAGCATTTACCCTGGCCATATCACAGGCCATAGGAT 
GGCCTGGGACATGATGATGAACTGGAGCCCTTGGGTCGCCATGACCCCTAC 
CCACACAGCTCAGGAGACACATTGACCTCCTGGTCGGCTCCAG<»CrCTGGCATTACCCT^ 
TTTAAGGTCAGGATGTArcTCGGCGGAGTGGAACACAGACTGGAAGCCGCTGTGTT^ 

AAGGAAACCCGCTAGGCTCATCGTCTTCCCTGACCTCGGCGTCAGGGTCTGCGAAAAGATGATGGGATACATTCCCCTCG 
TGGGAGCCCCTCTGGGAGGCGCTGCCAGAGCCCTCGCCCATGGCGTCAGGGTGCTGGAAGACGGAGTGAATGGCGGAAAC 
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GCTGGCAGAACCACAAGCGGACTGGTCAGCCTCCTGACACCCGGAGCCAAACAGAATATCCAACTGATTAACACAAACGG 

ACTGGCTCTGCTCAGCTGTCTGACAGTGCCTGCCTCCGCCTATCAGGTCAGGAATAGCACAGGCCTCTACCATGTGACAA 

ACGATTGCCCTGGCAGAGACAAAAACCAAGTGGAAGGCGAAGTGCAAATGGTCAGCACAGCCGCTCAGACATTCCTCGCC 

ACATGCATTAACGGAGTGTGTCCCGCTACCCAACTGAGAAGGCATATCGATCTGCTCGTGGGAAGCGCTACCCTCTGCTC 

CGCCCTCTACGTCGGCGATCTGTGTGGCTCCCACGCTCCCACAGGCTCCGGCAAAAGCACAAAGGTCCCCGCTGCCTATC 

CCGCTCAGGGATACAAAGTGCTCGTGCTCAACCCTAGCGTCAGGACATGGGCTCAGCCTG<;CTATCCCTGGCCC 

GGAAACGAAGGCTGTGGCTGGGCCGGATGGCTCCTGTCCCCCAGAGGCTCCACCGAAGACGTCGTGTGTTGCTCCATGTC 

CTACTCCTGGACAGGCGCTCTGGTCACCCCTTGCGCTGCCGAAGAGCAAAAGCTCCCCATTGCCCT 

TCCACCGGATTCACAAAGGTCTGCGGAGCCCCTCCCTGTGTGATTGGCGGAGCCGGAAACAATACCCTCCACTGTCCCAC 

AAGCGTCGAGGAAGCCTGTAGCCTCACCCCTCCCCATAGCGCTAAGTCCAAGTTTGGCTATGGCGCTAAGGATGTGAGAT 

GCCATGCCAGAATCTCCGGCATTCAGTATCTGGCTGGCOTCAGCACACTGCCTGGCAATCCCGCTATCX3CTAGCCTCATG 

GCTTTCACAGCCGCTGTGACACAGATTGTGGGAGGCGTCTACCTCCTGCCTAGGAGAGGCCCTAGGCTCGGCGTCAGGGC 

TACCAGAAAGACAAGCGAAAGGTCCCAGCCTCTGCATAGCTATAGCCCTGGCGAAATCAATAGGGTCGCCGCTTGCCTGA 

GG AAA CTGGGAGTGCCTCCCCTCAGGGCTTGGAGACACAGAACCGCTAGGCATACCCCTGTGAATAGCTGGCT 

ATTATCATGTTCGCTCCCACACTGTGGGCCAGAATGATTCTGATGACCCATGAGAATCTGGAAA 

TGTGTTTACCGATAACTCCAGCCCTCCCGCTGTGCCTCAGTCCTTCCAAGTGGCTCACCTCGCCACACCCCCTGGCrCXrG 
TGACAGTGCCTCACCCTAACATTGAGGAAGTGGCTCTGTCCACCACAGGCGAAATCCCTTTCTATGGCAAACTGGTCTTC 
GATATCACAAAGCTCCTGCTCGCCGTCTTCGGACCCCTCTGGATTCTGCAAGCCTCCCT 

CACCGCTGCCCTCGTGATGGCCCAACTGCTCAGGATTCCCCAAGCCATTCTGGATATGATTGCCGGAGCCCATTG^GGAG 

TGCTCGCCGGATGCAATACCTGTGTGACACAGACAGTGGATTTCTCCCTcGAcCCCACATTCACAATCGAAACCACAACC 

CTCCCCCAAGACGCTGTGTCCCACGGACCCACACCCCTCCTGTATAGGCTCGGCGCTGTGCAAAACGAAGTGACACTGAC 

ACACCCTGTG AC AAAGT ATAT CATGAC CTGTGCC AGAGTGGCT ATC AAAAG CCTCACCGAAAGGCTCT ACGTCGGCGGAC 

CCCTCACCAATAGCAGAGGCGAAAACTGTGGCTATAGGAGATGCGTCATCGGAGGCGCTGGCAATAACACACTGC 

CCTACCGATTGCTTTAGGAAACACCCTGAGGCTACCTATAGCAGATGCGGAACCTGTGGCTCCAGCGATCTGTATCTGX^ 

CACCAGACACGCTGACGTCATCCCTGTGAGAAGGAGAGGCGATAGCAGAGGCTCCCTGCTCAACATGTGGTCCGGCACAT 

TCCCTATCAATGCCTATACCACAGGCCCTTGCACACCCCTCCCCGCTCCGAATTACACATTCGCTCTGTGGCACTCCACC 

GATGCCACAAGCATTCTGGGAATCGGAACCGTCCTGGATCAGGCTGAGACAGCCGGAGCCAGACTGGTCGTGCTCGCCAC 

ATACGTCCCCGAAAGCGATGCCGCTGCCAGAGTGACAGCCATTCTGTCCAGCCTCACCGTCACCCAACTGCTCAGGAGAC 

TGCATCAGTGXSAGGCCTAGGTGGGGCCCTACCGATCCCAGAAGGAGAAGC^GAAACCTCGGCAAAGTG 

ACATGCGGATTCGCTGACCTCGGCCCTGACCAAAGGGCTTACTGTTGGCATTACCCTCCCAAACCC^ 

CCGAACAGTTTAAGCAAAAGGCTCTGGGACTGCTCCAGACATACCAAGCCACAGTGTGTG 

CCTAGCTGGGACCAAATGTGGAAGTGTCTGATTAGGCTCAAGCCTACCCTCTGCGGAATCGTCCCCGCTAAGTCCGTGTG 
TGGCCCTGTGTATTGCTTTACCCCTAGCCCTGTGGTCGTGGGAACCACAGACAGAAGCGGAAGCTCCCTGACAGTGACAC 
AGCTCCTGAGAAGGCTCCACCAATGGATTAGCTCCGAGTGTACCACACCCTGTAGCGGAAGCTGGCTGAGAGACCTCAGC 
GATGGCTCCTG^TCCACCGTCAGCTCCGAGGCTGGCACAGAGGATGTGGTCTGCTGTAGCATGAGCTATAGCTGGACCGG 
ATGGGATC^GATGTGGAAATGCCTCATCAGACTGAAACCCACACTGCATGGC 

CCGTCCAGAATCTGGCTGAGCAATTCAAACAGAAAGCCCTCGGCCTCCTGCAAACCGCTAGCAGACAGGCTGAGGTCATC 

GCTCCCGCTGTGCAAACCAATTGGCAAAAGCTCGAGGTCTTCTGGGCCAAACACATGTGGAATTTCATTAGCGG 

ATACCTCGCCGGACTGTCCACCCTCCCCGGACTGATTGCCTTTCCCTCCAGX^ 

TGCCTGAGTCCGACGCTGCCGCTAGGGTCACCGCTATCCTCGCCACACTGTGTAGCGCTCTGTATGTGGGAGACCTCTGC 

GGAAGCGTCTTCCTCGTGGGACAGCTCTTCACATTCTrCCCCCAGAAGGCATAGCTCCGTGCTCTGCGAATGCTATGACGC 

TGGCTGTGCCTGGTACGAACTGACACCCGCTGAGACAACCGTCAGGCTCAGGGCTTACIATGGGCTGGGTGGCTGCCCAAC 

TGGCTGCCCCTGGCGCTGCCACAGCCTTTGTGGGAGCCGGACTGGCTGGCX3CTGCCATTGGCTCCGTGGGAAGCTGG 

ATTAACTCCACCGCTCTGAATTGCAATGAGTCCCTGAATACCGGATGGCTCGCCGGACTGTTTTACCAACAC 

TAACGCTCTGTCCAACTCCCTGCTCAGGGATCACAATCTGGTCTACTCCACCACA^ 

AGAAAGTCACAGCCGCTATGTCCACCAATCCCAAACCCCA 

GTCAAGTTTCCCGGAGGCGGAAGCCAAACCAAACAGTCCGGCGAAAACTTTCCCTATCT^ 

CTGCGKTTAGGGCTCAGGCTCCCCCTCCCTCCGCCCCTACCTATAGCTGG^GGCGCTAACGATACCGATGTGTTTG 

ACAATACCAGACCCCCTCTGGGAAACTGGTTCGGATGCACAGTGCCTCCCCCTAGGAAAAAGAGAACCGTCGTGCTCACC 

GAAAGCACACTGTCCACCGCTCTGGCrcAGCTCGCCACAAAGTCCTTCGGAAGCACAACCT 

ACAGAAAAAGGTCACCTTTGACAGACTGCAAGTGCTCGACTCCCACTATCAGGATGTGCTCGACCAAGCCGAAACCGCTG 
GCGCTAGGCTCGTGGTCCTGGCTACCGCTACCCCTCCCGGAAGCGTCACCGTCCCCCATCCCAATATCGAgTTtCATTAC 
GTCACCGGAATGACAACCGATAACCTCAAGTGTCCCIKn*CAGGTCCCCTCCCCC 
CCTGAAACTGAGACCCATTGCCGCTGCCGGAAGGCTCGACCTCAGCGGATGG 

TCTATCACTCCGCCTCCAGGCAAGCCGAAGTGAT^GCCCCTGCCGTCCAGACAAACTGGCAGAAACTGGAAGTGTTTTGG 
GCTAAGCATATGTGGAACTTTTGCAGAGCCTCCGGCGTCCTGACAACCTCCTGCX^^ 

AGCCAGAGCCGCTTGCAGAGCCGCTGGCCTCTTCGATAGGCTCCAGGTCCTGGATAGCCATTACCAAGACGTCCTGAAAG 
AGGTCAAGGCTGCCGCTAGCAAAGTGAAAGCCAATCTGCTCGGCCCTCTGACAAACTCCAGGGGAGAGAATTGCGGATAC 
AGAAGGTGTAGGGCTAGCGGAGTGCTCACCACAAGCTGTGGCAATACCCTCATCATGCACACAAGGTGTCACTGTGGCGC 
TGAGATTACCGGACACGTCAAGAATGGCACAATGAGAATCGTCGGCCCTAGGACATGCAGAGAGGTCAGCTTTAGGGTCG 
GCCTCCACGAATACCCTGTGGGAAGCCAACTGCCTTGCGAACCCGAACCCGATGTGGCTGTGCTCACCTCCAAGGAAGTG 
AAAGCCGCTGCCTCCAAGGTCAAGGCTAACCTCCTGTCCGTGGAAGAGGCTTGCTCCCTGACA 

AGGCAGAGACGCTGTGATTCTGCTCATGTGTGTGGTCCACCCTACCCTCGTGTTTGACATTACCAAACTGCTCCTGGCTG 
TGTTTCGCCCTATGCTCACCGATCCCTCXrCACATTACCGCTGAGGCTCCCGGA^ 

TCCATGGCTAGCTCCAGCGCTAGCCCTAGGCCTATCTCCTACCTCAAGGGAAGCTCCGGCGGACCCCTCCTGTGTCCCGC 
TGGCCATGCCGTCGGCATTTTCAGAGCCGCTGACTTTGACCAAGG^TGGGGCCCTATCTCCTACGCTAACGGAAG 
CCGATCAGAGACCCTATTGCTGGCACTATCCCCCTAAGCCTAGGCATGTGGGACCGGGAGAGGGAGCCGTCCAGTGGATG 
AATAGGCTCATCGCTTTCGCTAGCAGAGGCAATCACGTCAGCCCTACCCATctcgagtgagaattcgcc 
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Cassette C 



ggcgga t c caeca t gc t cgagTGCCTCTGGATGATGCTCCTGATTAGCCAAGCCGAAGCCGCTCTGGAAAACCTCGTG AT 

TCTGAATGCCGCTAGCCTCGCCGGAACCCATATCATTCCCGAra 

AGTGTAGCCAACACCTCCCCTATATCGAACAGGGAATGATGCTC^ 

CT CTA CGG AGTGGG AAG CT C C A TCG CT AG CTGGG C CA TT AAGTGGGAGT ATGTGTCC CA CGCT AGG CCT AG G TGGTTCTG 
GTTCTGTCTGCTCCTGCTCGCCGCTGGCGTCGGCATTTACCTCCreCCTAAC^GAGC 
GCGCITAGATGAGCAAAGCCCATGGC^TTGACCCTAACATTAGGACAGGCG 
GGACTGCTCAGGATTTGCGCTCTGGCTAGGAAAATGATTGGCGGACACT^ 

TAGGAGATTCGCTCAGGCTCTGCCTGTGTGGGCCAGACCCGATTACAATCCCCCTCTGGTCGAGACATC 
ACTATGAGCCTACCGCTGCCCAAACO"TT 'CTGG CTACCTGTATCAATGGCGTCTGCTGG ACCGTCTACCATGGCGCTGGC 
ACAAGGACAATCGCTAGCCCTTGGGCTCACAATGGCCTCAGGGATCTGGCTGTGGCTG 

AATGGAAACCAAACTGATTACCTGGGGCGCTAAGGGACCCGTCATCCAAATGTATACCAATGTGGATCAGGATCTGGTC^ 

GCTGGCCCGCTCCCCAAGGCTCCAGGTCCCTGACACCCTCTAAGGTCGTGATTCTGGATAGCTTTGACCC^ 

GAAGAGGATGAGAGAGAGATTAGCGTCCCCGCTGAGATTCTGAGAAAGTCCCTGAC^GGCACATACGTCTACAA 

CACCCCTCTGAGAGACTGGGCCCATAACGGACTGAGAGACCTCGCCX5TCGCCGTCGAGCCTGTGTGT 

CCAAAGCCGTgGAt TTTATCCCTGTGGAAAACCTCGAGACAACCATGAGGTCCCCCGTCTTCACAGACAATGCCCTCGGC 

ATTAACGCTGTGGCTTACTATAGGGGACTGGATGTGTCCGTGATTC^ 

TATGTCCGCCGATCTGGAAGTGGTCACCTCCACCTGGGTGCTCGTGGGAGGCGTCCTGGCTGCCCTCGCCGCTTACT 

TGTCCACCGGAGCCCTCATGACAGGCTATACCGGAGACTTTGACTCCGTGATTGACTGTAACACATGCGTC^ 

GTgGAtTTTAGCCTCGACCCTAACACAAACAGAAGGCCTCAGGATGTGAAATTCCCTGGCGGAGGCCAAATCG 

CCACAGGCAATCTGCCTGGCTGTAGCTTTAGCATTTTCCTCAGCAAATT 

GCTAGGAAAGCCGTCGCCCATATCAATAGCGTCTGGAAAGACCTCCTGGAAACCCCTGGCGCTAAGC 

CATCAATACCAATGGCTCCTGGCATATCAATAGCACAGCCCTCAACn'GTAACGAAAGCCT 

GCCTCTTCTATCAGCATAAGTTTAACTCCAGCGGATGCCCTGAGAGACTGGCTAGCTGT 

CTCTTCCTCCTGCTCGCCG ATGCCAGAGTGTGTAG CIX5TCTGTGG ATGATGCTGCTCATCTCCCAGGCTG AGGCTG CCCT 

CGACTGTGAGATTTACGGAGCCTGTTACTCCATCGAACCCCTCGACCTCCCCCCTATCATTCAGAGACTGCATGG 

GCGCTTTCTCCTGGACAGTGTATCACGGAGCCGGAACCAGAACCATTGCCTCCCCCAAAGGCCC^ 

ACAAACGTgGAtCAAGACCTCTACAGATTCGTCGCCCCTGGCGAAAGGCCTAGCXSGAATGTTTGACTCCAGCGTCC 

TGAGTGTTACGATGCCGGATGCGCTTGGTATAGGTCCGAGCTCAGCCCTCTGCTCCTGTCCACCACACAGTGGCAGGTCC 

TGCCTTGCTCCTTCACAACCCTCCCCGCTCTGTCCACCGGACTGAGAAAGCTCGGCGTCCCCCCTCTGAGAGCCT 

CATAGGGCTAC^TCCGTGAGAGCCAGACTGCTCGCCAGAGGCGGAAGGGCTAGCCCTCTC 

CTTCAATATCCTCGGCGGATGGGTCGCCX5CTCAGCTCGCCGCTCCCGGAGCCGCTACCGCT 

GCCTCCTGAAAGTGCCTTACTTTGTX1AGAGTGCAAGGCCTCCTGAGAATCTC 

GGAACCATGAGGATTGTGGGACCCAGAACCTGTAGGAATATGTGGAGCX5GAACCTTTCCCATTAAC 
AGAGGTCGCCCTCAGCACAACCGGAGAGATTCCCTTTTACGGAAAGGCTATCCCTCTGGAAGTGATTAAGGGAG 
ACCTCATCTTTCTGACAAGaGAcCCCACAACCCCTCT 
TCCTGGCTCGGCAATATCATTAGGGTCAGCGCTGAGGAATACX^TCXiAG 

AGGCATGACCACAGACAATCTGAAATGCCCTCCCGTCGTGCATGGCnxrrCCCCTCCCCCCTCCCAGAAGCCCTCCCG 

CCCCTCCCAGAAAGAAAAGGACAGTGGTCCTGACAGAGTCCACCCTCAGCACAGCCCTCGCCGAACTGGCTACCAAAAGC 

TTTGGCTCCAGCTCCACCTCCGGCATTACCX3GAGACAATACCACAACCTCCGTGTCCTGCCAAAGGGG 

CTGGAGAGG CG ATGG CATTATGC AT ACCAG ATG CC ATTGCGGAGCCG AAATCAC AGGCCATGTGTTTCTGGTCGG CC AAC 

TGTTTACCTTTAGCCCTAGGAGACACTGGACCACACAGGGATGCAATTGCTCCATCTATCCCGGACACATTTTCGTCGGC 

GCTGGCCTCGCCGGAGCCGCTATCGGAAGCGTCGGCCTCGGCAAAGTGCTCGTGGATATCCTCGCCGGATACGGAGCCGG 

AGACATTTGGGATTCGATTTGCX3AAGTCCrCAGCX3ATTTCAAAACCT 

GCATTCCCTTTAACTCCAGCATTGTGTATGAGGCTGCCGATGCCATTCTGCATACCCCTG^ 

GAAGGCAATGCCTCCAGGTGTAGCTCCGGCTGTCCCGAAAGGCTCGCCTCCTGCAGAAGGCT 

ATGGGGACCCATTAGCTATGCGAATGGCTCCAGGACAGAGGAAGCCATTTAC^ 

GGGTCGCCATTAAGTCCCTGACAGAGAGACTGTATGTGGGAGTGTCCAAGGGATGGAGACTGCTCGCCCCTATCACAGCC 

TATGCCCAACAGACAAGGGGACTGCTCGGCTGTATCATTACCTCCCTGACATT 

ACTGGAACAGGCTCTGGATTGCGAAATCTATGGCGCTTGCTATAGCATTGAGCCTCTGGATTG 

AGTTTTTCACAGAGCTCGACGGAGTGAGACTGCATAGGTTTGCCCCTCCCTGTAAGCCTCTGCTCAGGGAAACCTGTTAC 

ATTAAGGCTAGGGCTGCCTGTAGG^CTGCCGGACTGCAAGACTGTACCATGCTGCT 

TATCGATCCCAATATCAGAACCGGAGTGAGAACCATTACCACAGGCTCCCCCATTACCT 

TGGCTGACGGATGCAATTGGACAAGGGGAGAGAGATGCGATCTGGAAGACAGAGACAGAAGCX3AACTGTCCCCC 
CTCAGCACAACCCAATGGCAAACCGGACACAGAATGGCTTGGGATATGATGATC 

CATGGCTCAGCTCCTG AG AATCCCTCAGGCTCC CGG AG CCCTCGTGGTCGGCGTCGTGTGTGCCGCT ATCCTCAGG AG AC 
ACGTCXSGCCCTGGCGAAGGCGCTGTGCAATGGATGAACAGACACGATAGCCCTC 

CTCTGGAGACAGGAAATGGGAGGCAATATCAa^GGGTCGAGTCCGAGAATGAGGGAGTGTTTACCOT 

TGAOSCTCACTTTCTGTCCCAGACAAAGCAAAGCGGAGAGAA 

TCCCTAAGGCTAGGAGACCCGAAGGCAGAACCTGGGCCCAACCCGGATACCCTTO 

TTTCCCGATCTGGGAGTGAGAGTGTGTGAGAAAATGGCTCTGTATGACGTCGTGT^ 

ATACGCTACCGGAAACCTCCCCGCATGCTCCTTCTTCCATCTTTCT 

G CG CTT ACCAA CTGGG AAA GGTCCTGGTgG At ATT CTGGCTGGCT ATGG CGCT 

AAAATCATGAGCGGAGAC^CAGCTATAGCTCCATGCCTCCCCTCGAGGGAGAGCCT^ 

AAGCTGGAGCACAGTGTCCAGCGAAGCCXK»AAGGCAAGAGATGGGCGGAAAC^ 

TGGTCATCCTCGACTCCTTCGATCCCCTCGTGGCTGAGGAAGTGGGATGGCCTC 

CCTTGCACATGCGGAAGCTCCGACCTCTACCrrCGTGACAAGG 

TATCTGTGACGAATGCCATAGCACAGArcCTACCTCCATCCTCXK3CATTGGCACAGTGCT 

C CACA CTGCCTC AGG ATGCCG TC AGCAG AA CCC AAAC^ AG AGGCAG AACCGGAAGGGG AAAGCCTGG CATTC 

AGAAAGCATCCCGAAGCCACATACTCCAGGTGTGGCTCCGGCCCTTGGATTACCCCTAGGTGTCTGGTgGAtTATC 

TCTGGCTGCCGGAGTGGGAATCTATCTGCTCCCCAATAGGGCTGCCGCCCTCGTGACACCCT 
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AACTGCCTATCAATGCCCTCAGCAATAGCCTCCTGAGACACCATAACCTCGTGTATATCTCCAGCGAATGCACAACCCCT 

TGCTCCGGCTCCTGGCTCAGGGATATCTGGGACTGGATCTGTGAGGTCCTGTCCGACTTTAAGACATGGCTCAAGGCTAA 

GCTCATGCCTCAGCTCCCCGGAATCCCTTTCGTCAGCTGTCAGAGAGGCTATAAGGGAGTGTGGAGGGGAGACGGAAGCG 

GACCCTGGATCACACCCAGATGCCTCGTGGATTACCCTTACAGACTGTGGCACTATCCCTGTACC^ 

TTCAAAagatctTGAgtcgacgaattcgcc 
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Melanoma Savine design 

Two savines - one containing scrambled melanocyte differentiation Ags 
- one containing scrambled melanoma cancer specific Ags 

Genes in melanocyte differentiation Savine 

gplOO 

MDLVLICRCLIJII^VIGALLAVGATKVPRNQDWLGVSRQLRTKAWNRQLYPEWTEAQRLDCWRGGOVSLKVSNDGPTLI 

GANASFSIALNFPGSQKVLPDGQVIWVNNTIINGSQWGGQ 

GQ YWQVLGGPVSGLS I GTGRAMLGTHTMETVTVYH 

NQPLTFALQLHDPSGYIAEADLSYTVTOFGDSSGTLI^ 

HRPTAEAPNTTAGQVPTTEWGTTPGQAPTAEPSGTTSVQVPTTETVISTAPVQMPTAESTGMTPEKVPVSEVMGTTLA 

EM S T P E ATGM T PAE V S I WL S GTTAAQ VTTTEWVE TT AR ELPIPEPEG PDAS SIMSTESI TG S L»G PI»L»DGT ATLRL V K 

RQVPLDCVLYRyGSFSVTLDIVQGIESAEILQAVPSGEGDAFELTySCQGGLPKEACMEISSPGCQPPAQRLCQPVLP 

SPACQLV^HQILKGGSGTYCLNVSlxADTNSLA 

FSVPQLPHSSSHWLRLPRIFCSCPIGENSPLLSGQQV 

MART 

MPREDAHFIYGYPKKGHGHSYTTAEEAAGIGILTVILGVLLLIGCWYCRRRNGYRALMDKS1,HVGTQCA1,TRRCP0EG 
FDHRDSKVSLQEKNCEPVVPNAPPAYEKLSAEQSPPPYSP 

TRP-1 

PAFLTWHRYHLLRLEKDMQEMLQEPSFSLPYVWFATGKNVCDICTDDLMGSRSNFDSTLISPNSVFSQWRVVCDSLED 
YDTIX5TLCNSTEDGPIRRNPAGNVARPMVQRLPEPQDVAQCLEVGLFDTPPFYSNSTNSFRNTVEGYSDPTGKYDPAV 
RSLHNLAHLFLNGTGGQTHLSSQDPIFVLLHTFTDAVFDEWLRRYNADISTFPLENAPIGHNRQYNMVPFWPPVTNTE 
MFVTAPDNLGYTYE 

Tyros 

MLLAVXYCLLWSFQTSAGHFPRACVSSKNLMEKECCPPWSGDRSPCGQLSGRGSCQNILLSNAPLGPQFPFTGVDDRE 

S W PS VFYNRTCQCSGNFMGFNCGNCKFGFWGPNCTERRLLVRRN I FDLS APEKDKFFAYLiTLAKHT I S SDYV I P IGTY 

GQMKNGSTPMFNDINIYDLFWMHYYVSMDALLGGSEIWRDIDFAHEAPAFLPWHRLFLLRWEQEIQKLTGDENFTIP 

YWDWRDAEKCDICTDEYMGGQHPTNPNLLSPASFFSSWQIVCSRLEEYNSHQSLCNGTPEGPLRRNPGNHDKSRTPRL 

PSSADVEFCLSLTQYESGSMDKAANFSFRl^LEGFASPLTGIADASOSSMHNAIjHIYMNGTMSQVQGSANDPIF^ 

AFVDSIFEQWLQRHRPLQEVYPEANAPIGHNRESYMVPFIPLYRNGDFFISSKDLGYDYSYLQDSDPDSFQDYIKSYL 

EQASRIWSWLLGAAMVGAVLTALLAGLVSLLCRHKRKQLPEEKQPLLMEKEDYHSLYQSHL 

TRP2 

MS PLWWGFLLSCLGCKI LPGAQGQFPRVCMTVDSLVNKECCPRLGAESANVCGSQQGRGQCTEVRADTRPWSGPY I LR 

NODDRELWPRKFFHRTCKCTGNFAGYNCGDCKFGWTGPNCERKKPPVIRQNIHSLSPQEREQFLGALDLAKKRVHPDY 

VirrQHWLGLU3PNGTQPQFANCSV^FFWLHYYSVTU)TLlX3PGRPYRAIDFSHQGPAFVTWHRYHLL 

IGNESFALPYWNFATGRNECDVCTDQLFGAARPDDPTLISRNSRFSSWETVCDSLDDYNHLVTLCNGTYEGLLRRNQM 

GRNSMKLPTLKDIRDCLSLQKFDNPPFFQNSTFSFRNALEGFDKADGTLDSQVT^SLHNLVHSFLNGTNALPHSAANDP 

IF\nn J HSFTDAIFDEWMKRFNPPADAWPQEIAPIGHNRMYNKTO 

WPTTL.LVVMGTLVALVGLFVLLAFLQYRRLRKGYTPLMETHLSSKRYTEEA 

MC1R 

MAVQGSQRRLLGSLNSTPTAIPQLGLAANQTGARCLEVSISDGLFLSLGLVSLVENALWATIAKNRNLHSPMYCFIC 
CLALSDLLVSGTNVLETAVILLLEAGALVARAAVLQQLDNVIDVITCSSMLSSLCFLGAIAVDRYISIFYALRYHSIV 
TLPRAPRAVAAI WVASVVFSTLFIAYYDHVAVLLCLVI/FFLAMLVLMAVIjYVHMLARA 

FGLKGAVTLTI LLGI FFLCWGPFFLHLTLI VLCPEHPTCGCI FKNFNLFLALI ICNAI IDPLI YAFHSQELRRTLKEV 
LTCSW 

MUC1F 

MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVPSSTEKNAVSMTSSVLSSHSPGSGSSTTQGQDV 
TLAPATEPASGSAATWGQDVTSVPVTRPALGSTTPPAHDVTSAPDNK 
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MUC1R 



NRPALGSTAPPVHNVTSASGSASGSASTLVHNGTSARATTTPASKSTPFSIPSHHSDTPTTLASHSTKTDASSTHHSS 
VPPLTSSNHSTSPQLSTGVSFFFLrSFHISNLQFNSSLEDPSTDYYQELQRDISEMFLQIYKQGGFLGLSNIKFRPGSV 
WQLTLAFREGTI^^raDVETQFNQYKTEAASRYNLTISDVSVSDVPFPFSAQSGAGVPGWGIALLVLVCVLVALAIVY 
LIALAVCQCRRKNYGQLDIFPARDTYHPMSEYPTYHTHGRYVPPSSTDRSPYEKVSAGNGGSSLSYTNPAVAAASANL 



NB Muc 1 Repeat sequences in the middle of the gene were removed 



Genes in melanoma specific Savine 
BAGE 

MAARAVFLALSAQLLQARLMKEESPWSWRLEPEDGTALCFI F 
GAGE - 1 

MSWRGRSTYRPRPRRYVEPPEMIGPMRPEQFSDEVEPATPEEGEPATQRQDPAAAQEGEDEGASAGQGPKPEADSQEQ 
GHPQTGCECEDGPDGQEMDPPNPEEVKTPEEEMRSHYVAQTGILWLLMNNCFLNLSPRKP 

gpl00In4 

SWSQKRSFVYVWKTWGEGLPSQPIIHTCVYFFLPDHLSFGRPFHLNFCDFL 



MSLEQRSLHCKPEEALEAQQEALGLVCVQAATSSSSPLVLGTLEEVPTAGSTDPPQSPQGASAFPTTINFTRQRQPSE 
GSSSREEEGPSTSCILESLFRAVITKKVADLVGFLLLKYRAREPWKAEMLESVIKNYKHCFPEIFGKASESLQLVFG 
IDVKEADPTGHSYVLVTCLGLSYDGLLGDNQIMPKTGFLIIVLv^ 

PRKLLTQDLVQEKYLEYRQVPDSDPAKYEFLWGPRAIjAETSYVKVLEYVI KVSARVRFFFPSLREAALREEEEGV 



MPLEQRSQHCKPEEGLEARGEALGLVGAQAPATEEQEAASSSSTLVEVTLGEVPAAESPDPPQSPQGASSLPTTMNYP 
LWSQSYEDSSNQEEEGPSTFPDLESEFQAALSRKVAELVHFLLLKYRAREPVTK/^MLGSVVGNWQYFFPVIFSKASS 
SLQLVFGIELMEVDPIGHLYIFATCLGLSYDGLLGDNQIMPKAGIjI-iI IVLAI IAREGDCAPEEKIWEELSVLEVFEGR 
EDSILGDPKKLLTQHFVQEOTLEYRQVPGSDPACYEFLWGPRAIiV^TSYVKVL,HHMVKISGGPHISYPPLHEWVLREG 
EE 



MERRRLWGSIQSRYISMSWTSPRRLVELAGQSLLKDEALAIAAL.ELLPRELFPPLFMAAFDGRHSQTLKAMVQAWPF 
TCLPLGVLMKGQHLHLETFKAVLDGI^VLLAQEWPRRWKLQVLDLRKNSHQDFWTWSGNRASLYSFPEPEAAQPMT 
KKRKVDGLSTEAEQPFI PVEVLVDLFLKEGACDELFS YLI EKVKRKKNVLRifCCKKLKI FAMPMQDIKMILKMVQLDS 
IEDLEVTCTWKLPTLAKFSPYLGQMINLRRLLLSHIHASSYISPEKEEQYIAQFTSQFLSLQCLQALYVDSLFFLRGR 
LDQLLRHvWPLETLSITNCRLSEGDVT^LSQSPSVSQLSVLSLSGVTILTDVSPEPLQALLERASATLQDLVFDEGGI 
TDDQLIiALLPSLSHCSQLTTLS FYGNS I S I SALQSLLQHLIGLSNLTHVLYPVPLES YEDIHGTLHLERLAYLHARLR 
ELLCELGRPSMVWLSANPCPHCGDRTFYDPEPILCPCFMPN 

TRP2IN2 

LMETHLSSKRYTEEAGGFFPWLKVYYYRFVIGLRVWQWEVI SCKLI KRATTRQP 
NYNSOla 

MQAEGRGTGGSTGDADGPGGPGIPDGPGGNAGGPGEAGATGGRGPRGAGAARASGPGGGAPRGPHGGAASGLNGCCRC 
GARGPESRLLEFYU^PFATPMEAEI^RSIAQDAPPLPVPGVLLKEFTVSGNILTIRLTAADHRQLQLSISSCLQQL 
SLLMWI TQCFLPVFLAQPPSGQRR 

NYNSOlb 

MLMAQEAlAFLMAQGAMIJ^QERRVPRAAEVPGAQGQQGPRGREEAPRGvT?MAARLQG 



MAGE - 1 



MAGE- 3 



PRAME 



LAGE1 
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MQAEGQGTGGSTGDADGPGGPGIPDGPGGNAGGPGEAGATGGRGPRGAGAARASGPRGGAPRGPHGGAASAQDGRCPC 
GARRPDSRLLQLHITMPFSSPMEAELVRRILSRDAAPLPRPGAVLKDFTVSGNLLFIRLTAADHRQLQLSISSCLQOL 
SLLMWITQCFLPVFLAQAPSGQRR 



Differentiation Savine Scramble process 



Disease name 
Input filename 
Output filename 
Number genes 
Number segments 
Segment length 
Segment overlap 



melanoma 
Diffmucg.txt 
Dif fmucs . txt 
8 

187 

30 

15 



Segments in original order: 



Gene : gplOO 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMDLVLKRCLLH LAVI GALLAVGATKV PR 
GCCGCTATGGATCTGGTCCTGAAAAGGTGTCTGCTCC ACCTCGCCGTCATCGG AG CCCTCCTGG CTGTGGGAGCCA C AAAGGTCCCC AG A 



Gene : gplOO 

Segment# : 2 
Offset : 16 

1st Codon : 1 

VIGALLAVG ATKVPRNQDWLGVSRQLRT K A 
GTGATTGGCGCTCTGCTCGCCGTCGGCGCTACCAAAGTGCCT^ 



Gene 
Segment^ 
Offset 
1st Codon 
N Q D 



gplOO 
3 

31 " 
1 

W L G V 



R L 



AACCAAGACTGGCTCGGAGTGTCCAGGCAACTGAGAACCAAAGCCrGGAACAGACAGCT 



Gene 
Segments 
Offset 
1st Codon 
w N R 



gplOO 
4 

46 
1 



E W T E 



L D C W R G 



S 



S N 



TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAGGCCAAGTGTCCCTGAAAGTGTCCAACGA 



Gene : gplOO 

Segments : 5 
Offset : 61 

1st Codon : 1 

DCWRGGQVSLKVSNDGPTLIGANASFS I A L 
GACTGTTGGAGAGGCGGACAG^TCAGCCTCAAGGTCAGCAATGACGGACCCACACTGATTGGCGCTAACGCTAGCTTTAGCATTGCC 



Gene 
Segments 
Offset 
1st Codon 
G P T L 



gplOO 

6 

76 
1 

I G A N 



L N 



Q K 



Gene 
Segments 
Offset 
1st Codon 
N F P G 



gplOO 
7 

91 
1 

S Q K 



L P D G Q 



N G S 



AACTTTCCCGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATCTGGGTGAATAACACAATCATTAAC 



Gene 
Segments 
Offset 
1st Codon 



gplOO 
8 

106 
1 
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WVNNT II NGSQVWGGQP VVPQETDDAC I FP 
TGGGTCAACAATACCATTATCAATGGCTCCCAGGTCTG^GGGAGGCCAACCCGTC™^ 



Segment # : 9 
Offset : 121 
1st Codon : 1 

OPVYPQETDDACI FPDGGPCPSGSWSQKRS 
CAGCCTGTGTATCCCCAAGAGACAGACGATGCCTGTATCTTTCCCGAT^ 

Gene : gpioo 

Segment # : 10 » 
Offset : 136 
1st Codon : 1 

DGGPCPSGSWS QKRSFVYVWKTWGQYWQVL 
G ACGG AGGCCCTTGCCCTAGCGGAAG CTGGAG CCAAAAG AG AAGCTTTGTGTATGTGTGGAAG ACATGGGG ACAGTATTG 

Gene : gplOO 

Segment # : 11 
Offset : 151 
1st Codon : 1 

FVYVWKTWGQYWQVLGG PVSGLS IGTGRAM 
TTCGTCTACGTCTTGGAAAACCTGGGGCCAATACTGGCAGGTCCTGGGAGGCCCTGTGTCCGGCCT 

Gene : gplOO 

Segment # : 12 
Offset : 166 

1st Codon : 1 

GG PVSGLS IGTGRAM L GTHTMEVTVYHRRG 
GGCGGACCCCTCAGCGGACTCTCCATCGGAACCGGAAGGGCTATGCT 

Gene : gplOO 

Segment*! : 13 
Offset : 181 

1st Codon : 1 

LGTHTMEVTVYHR RG SRSYV PLAHSSSAFT 
CTGGGAACCCATACCATGGAGGTCACCGTCTACCATAGGAGAGGCTCCAGGTCCTACGTCCCCCT 

Gene : gplOO 

Segments : 14 
Offset : 196 
1st Codon : 1 

SRSYVPLAHS SSAFT1TDQVPFSVSVSQLR 
AGCAG AAGCT ATGTGCCTCTGGCTCA CTCCAG CTCCGCCTTTACC ATTACCG ATCAGGTCCCCTTT AGCGTCAGCGTCAGCC AACTGAG A 

Gene : gplOO 

Segments : 15 
Offset ; 211 

1st Codon : 1 

I TDQVPFSVSVSQLRALDGGNKHFLRNQPL 
ATCACAGACCAAGTGCCTTTCTCCGTGTCCGTGTCCCAGCTCAGGGCTCT 

Gene : gplOO 

Segment U : 16 
Offset : 226 
1st Codon : 1 

ALDG GNKHFLRNQPLTFALQLHDPSGYLAE 
GCCCTCGACTGGAGGCAATAAGCATTTCCTCAGGAATCAGCCTCTGACATTCGCTCTGCAA 

Gene : gplOO 

Segment** : 17 
Offset : 241 
1st Codon : 1 

T FALQLHD PSGY LAEADLSYTWD F G DS SGT 
ACCTTTTGCCCTCCAGCTCCACGATCCCTCCGGCTATCTGGCTGAGGC^ 

Gene : gplOO 

Segment # : 18 
Offset : 256 
1st Codon : 1 

ADLSYTWDFGDS SGTLI SRAL.VVTH TYLE P 



Gene 



gplOO 
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Gene : gplOO 

Segments : 19 
Offset : 271 

1st Codon : 1 

LISRALVVTHTYLEPGPVTAQVVLQAAI PL 
CTGAl^AGCAGAGCCCTCGTGGTCACCCATACCTATCTGGAACCCGGACCCGTCACCGCTCAGGTCGTGCTCCAG^ 

Gene : gplOO 

Segment S : 20 
Offset : 286 

1st Codon : 1 

GPVTAQVVLQAAI PLTSCGSS PVPGTTDGH 
GGCCCTGTGACAGCCCAAGTGGTCCTGCAAGCCGCTATCCCTCTGACAAGCTGTGGCTC 

Gene : gplOO 

Segments : 21 
Offset : 301 
1st Codon : 1 

TSCGSS PVPGTTDGHRPTAEAPNTTAGQVP 
ACCTCCTGCGGAAG^TCCCCCGTCCCCGGAACCACAGACGGACACA 

Gene : gplOO 

Segment^ : 22 
Offset : 316" 

1st Codon : 1 

RPTAEAPNTTAGQVPTTEVVGTTPGQA PTA 
AGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACrc 

Gene : gplOO 

Segment** : 23 
Offset : 331 

1st Codon : 1 

TTEVVGTTPGQAPTAEPSGTTSVQVPTTEV 
ACCACAGAGGTCGTGGGAACCACACCCGGACAGGCTTCCCACAGCCGAACCCTCCGGCACAACCTCCGTGCAAGTGCCTACCACAGAGGTC 

Gene : gplOO 

Segments : 24 
Offset : 346 
1st Codon : 1 

EPSGTTSVQVPTTEVX STAPVOMPTAESTG 
GAGCCTAGCGGAACCACAAGCGTCCAGGTCCCCACAACCGAAGTGATTAGCACAGCCCCTGTGCAAATGCCTACCGCTCAGTC 

Gene : gplOO 

Segments : 2 5 
Offset : 361 

1st Codon : 1 

ISTAPVQMPTAESTGMTPEKVPVSEVMGTT 
ATCTCCACCGCTCCCGTCCAGATGCCCACAGCCGAAAGC^CAGGCATGACCCCTGAGAAAGTGCCTGTC 

Gene : gplOO 

Segments : 26 
Offset : 376 

1st Codon : 1 

MTPEKVPVSEVMGTTLAEMSTPEATGMTPA 
ATGACACCCGAAAAGGTCCCCGTCAGCGAAGTGATGGGCACAACCCTCGCCGAAATGT^ 

Gene : gplOO 

Segments : 27 
Offset : 391 

1st Codon : 1 

LAEMSTPEATGMTPAEVSIVVLSGTTAAQV 
CTGGCTGAGATGAGCACACCCGAAGCCAC^GGCATGACCCCTGCCGAAGTGTCCA 

Gene : gplOO 

Segments : 28 
Offset : 406 

1st Codon : 1 

EVS IVVLSGTTAAQVTTTEWVETTARELPI 
GAGGTCAGCATTGTGGTCCTCTCCGGCAGAACCGCTGCCCAAGTGACAA 



Gene : gplOO 
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Segment S 
Offset 
1st Codon 
T T T 



154/216 



29 

421 

1 

W V 



E G 



D A 



ACCACW^CCGAATGGGTrcAGACAACCGCTAGGGAACTGCCTATCC 



Gene : gplOO 

Segments : 30 
Offset : 436 
1st Codon : 1 
PEP EG P DAS 



M S 



S I T G S 



Gene : gplOO 

Segments : 31 
Offset : 451 

1st Codon : 1 

SITGSLGPLLDGTATLRLVKRQVPLDCVLY 
AGCATTACCGGAAGCCTCGGCCCTCTGCTrcACGGAACCGCTACCCTCAGGCTC^ 



Gene 

Segments 
Offset 
1st Codon 
L R L V 



gplOO 
32 
466 
1 

K R 



QVPLDCVLYRYGS 



CTGAGACTGGTCAAGAGACAGGTCCCCCTCGACTGTGTGCTCTACAGATACGGAAGC^ 



Gene 
Segments 
Offset 
1st Codon 
R Y G 



gplOO 
33 
481 
1 

F 



Gene 

Segments 
Offset 
1st Codon 
E S A E 



gpioo 

34 

496 

1 

I L Q A V 



P S G E 



AFELT VSCQGGLPK 



GAGTCCGCCGAAATCCTCCAGGCTCTGCCTAGCGGAGAGGGAGACGCTTTCGAACTGA 



Gene 
Segments 
Offset 
1st Codon 
A F E 



gplOO 
35 
511 
1 

LTVSCQGG 



A C M E I S S 



C Q P 



GCCTTTGAGCTCACCGTCAGCTGTCAGGGAGGCCTCCCCAAAGAGGCTTGra 



Gene : gpioo 

Segments : 36 
Offset : 526 

1st Codon : 1 

ACME I S SPGCQPPAQRLCQPVLPS PACQLV 
GCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGCCTCCCGCTCAGAGACTGTGTCAGC 



Gene 
Segments 
Offset 
1st Codon 
R L C 



gplOO 
37 
541 
1 

QPVLPSPACQL 



H Q 



K G G S G 



AGGCTCTGCCAACCCGTCCTGCCTAGCCCTGCCTGTCAGCTCGTGCTCCACCAAATCCT 



Gene : gplOO 

Segments : 38 
Offset : 556 
1st Codon : 1 

LHQI LKGGSGTYCLNVSLADTN SLAVVSTQ 
CTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTATTGCCTCAACGTCAGCCTCGCCGATACCAATAGGCTCGCCGTCGTGTC 



Gene 

Segments 
Offset 



gpioo 

39 
571 
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155/216 

1st Codon : 1 

VSLADTNSLAVVSTQLIMPGQEAGL GQVPL 
GTGTCCCTGK5CTGACACAAACTCCCTGGCTGTGGTCAGCACACAGCTCATC^TC 

Gene : gplOO 

Segment# : 40 
Offset : 586 

1st Codon : 1 

LI MPGQEAGLGQVPLIVG I LLVLMAVVLAS 
CTGATT ATGCCTGG CC AAG AGG CTGGCCTCGG CC AAGTGCCTCTG ATTGTGGG AATCCTCCTGGTCCTGATGGCCGTCGTGCTCX5CCT 

Gene : gplOO 

Segments : 41 
Offset : 601 
1st Codon : 1 

IVG ILLVLMAVVLASLI YRRRLMKQDFSVP 
ATCGTCGGCATTCTGCTCGTGCTCATGGCTGTGGTCCTrGGCTAGCCT 

Gene : gplOO 

Segments : 42 
Offset : 616 
1st Codon : 1 

LIYRRRLMKQDFSVPQLPHSSSHWLRLPRI 
CTGATTT AC AG AAGG AG ACTG ATG AAG CAAG ACTTTAG CGTCCCCCAACTGCCTCACTCCAGCTCCCACrGGCTGAG ACTGCCTAGGATT 

Gene : gplOO 

Segments : 43 
Offset : 631 

1st Codon : 1 

QLPHSSSHWLRLPRI FCSCPIGENSPLLSG 
CAGCTCCCCCATAGCTCCAGCCATTGGCTCAGGCTCCCCAGAATCTTTTGCTCCTGCCCTAT 

Gene : gplOO 

Segments : 44 
Offset : 646 ^ ' 
1st Codon : 1 

FCSCPIGENSPLLSGQQVAA 
TTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAACAGGTCGCCGCT 

Gene : MART 

Segments : 1 
Offset : 1 
1st Codon : 1 

aam predahfi ygypkkghghsyttaeeaa 
gccgctatgcctagg^aagacgctcactttatctatggctatcccaaaaagggaca 

Gene : MART 

Segments : 2 
Offset : 16 
1st Codon : 1 

KKGHGHSYTTAEEAAGIGI LTVI LGVLLLI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCGAAGAGGCTGCCGGAATCGGAATCCTGACCGTCATCCTCGGCGTCCTGCT 

Gene : mart 

Segments : 3 
Offset : 31 
1st Codon : 1 

GIG ILTVI LGVLLLI GCWYCRRRNGYRALM 
GGCATTGGCATTCTGACAGTGATTCTGGGAGTGCTCCTGCTCATCGGATGCTGGTACTGTAGGAGA 

Gene : MART 

Segments : 4 
Offset : 46 
1st Codon : 1 

GCWYCRRRNGYRALMDKS LHVGTQCALTRR 
GGCTGTTGGTATTGCAGAAGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGG^ 

Gene : MART 

Segments : 5 

Offset : 61 

1st Codon : 1 

DKS LHVGTQCALTRRCPQEGFDHRDSKVSL 



Figure 27 (Cont) 



WO 01/090197 PCT/AU01/00622 



156/216 

GACAAAAGCCTCCACGTCGGCACACAGTGTGCCCTCACCAGAAGGTGTCCCCAAGAGGGAT^ 

Gene : MART 

Segments : 6 
Offset : 76 

1st Codon ; 1 

CPQEG FDHRDSKVS LQEKNC EPVVPNAP PA 
TGCCCTCAGGAAGGCTTTGACCATAGGGATAGCAAAGTGTCCCTGCAAGAGAAAAACTC 

Gene : MART 

Segment!* : 7 
Offset : 91 
1st Codon : 1 

QEKNCEPVVPNAPPAYEKLSAEQSPF PYS P 
CAGGAAAAGAATTGCGJ^CCCGTCGTGCCTAACGCTCCCC 

Gene : MART 

Segments : 8 
Offset : 106 
1st Codon : 1 

YEKLSAEQSPPPYSPAA 
TACXSAAAAGCTCAGCGCIGAGCAAAGCCCTCCCCCTTACrCCCCCGCTGCC 

Gene : TRP-1 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAPAF LTWHR YH LLR LEKDMQEMLQ E P S , F S 
GCCGCTCCCGCTTTCCTCACCTGGCACAGATACCATCTGCrCAGGCrrCGAGAAAGACATC 

Gene : TRP-1 

Segment # : 2 

Offset : 16 , 
1st Codon : 1 

lekdmqeml'qepsfslpywnfatgknvcd I 

CTGGAAAAGGATATGCAAGAGATGCTGCAAGAGCCTAGCTTTAGCCTCCCCTATTGGAATTTCGCT 

Gene : TRP-1 

Segments : 3 
Offset : 31 
1st Codon : 1 

LPYWN F ATGKNVCD I CTDDLMGSRSNFDST 
CTGCCTTACTGGAACTTTGCCACAGGCAAAAACGTCTGCGATATCTGTACCGAT^ 

Gene : TRP-1 

Segment** : 4 

Offset : 46 

1st Codon : 1 

CTDDLMGSRS NFDSTLI SPNSVFSQWRVVC 
TGCACAGACGATCTGATGGGCTCCAGGTCCAACTTTGACTCCACCCTCATCTCC<XTCAATAGCGTCTTCTC 

Gene : TRP-1 

Segment^ : S 
Offset : 61 
1st Codon : 1 

LISPNSVFSQWRVVCDSLEDYDTLGTLCNS 
CTGATTAGCCCTAACTCCGTGTTTAGCCAATGGAGAGTGGTCTGCGATAGCCTCGAGGATTACGATACCCTCGGCACACTGTGTAACT 

Gene : TRP-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

DSLEDYDTLGTLCNSTEDGPIRRNPAGNVA 
GACTCCCTGGAAGACTATGACACACTOGGAACCCrCTGCAATAGCACAGAG 

Gene : TRP-1 

Segments : 7 
Offset : 91 
1st Codon : 1 

TEDGP I RRNPAGNVARPMVQRLPEPQDVAQ 
ACCG AAGACGG ACCC ATT AGG AGAAACCCTGCCGGAAACGTCGCC AGACCC ATGGTG CAAAGG CTCCC CG AACCCCAAG ACGTCGCCC AA 
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157/216 

Gene : TRP-1 

Segments : 8 
Offset : 106 
1st Codon : 1 

RPMVQRLPEPQDVAQCLEVGLFDTPPFYSN 
AGGCCTATGGTCCAGAGACTGCCTGAGCCTCAGGATGTGGCTCAGTGTCTGGAAGTGGGAC^ 



Gene : TRP-1 

Segment^ : 9 
Offset : 121 

1st Codon : 1 

CLEVGLFDTPPFYSNSTNSFRNTVEGYSDP 
TGCCTCGAGGTCGGCCTCTTCGATACCCCTCCCTTTTACTCCAACTCCACCAATAGCTTTAGGAATACCGT 



Gene 

Segments 
Offset 
1st Codon 
S T N 



TRP-1 
10 
136 
1 

S F R N T 



P A 



R S 



H N 



AGCACAAACTCCTTCAGAAACACAGTGGAAGGCTATAGCGATCCCACAGGCAAATACGATCCCX3 



Gene : TRP-1 

Segments : 11 
Offset : 151 
1st Codon : 1 

TGKYDPAVRSLHNLAHLFLNGTG GQTHLSS 
ACCC^AAACTATGACCCTGCCGTCAGGTCCCTGCATAACCT 



Gene : TRP-1 

Segments : 12 
Offset : 166 

1st Codon : 1 

HLFLNGTGGQTHLSSQDPI FVLLHTFTDAV 
CACCTCTTCCTCAACGGAACCGGAGGCCAAACCC^TCTGTCCAGCCAAGACCCrATCTT^ 



Gene 
Segments 
Offset 
1st Codon 
Q D P 



TRP-1 
13 
181 
1 

F V 



H 



W 



CAGGATCCCATTTTCGTCCTGCTCCACACATTCACAGACGCTGTGTTTGACGAATGG 



Gene 
Segments 
Offset 
1st Codon 



TRP-1 
14 
196 
1 



FDEWLRRYNADI STFPLENAP I GHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCACATTCCCT 



Gene 

Segments 
Offset 
1st Codon 
P L E 



TRP-1 
15 
211 
1 

NAP 



H N R Q 



N M 



N 



M 



CCCCTCGAGAATGCCCCTATCGGACACAATAGGCAATACAATATGGTCCCCTTTTGGCCTCCCGTCA 



Gene : TRP-1 

Segments : 16 
Offset : 226 

1st Codon : 1 

VP FWPPVTNTEMFVTA PDNLGYTYEAA 
GTGCCTTTCTGGCCCCCTGTGACAAACAO^GAGATCTTCGTCACCGCrCCCG 



Gene : Tyros 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMLLAVLYCLLWS FQTSAGH FPRACVSSK 
GCrcCTATGCTCCTGGCTGTGCTCTACTGTCTGCTCTGGTCCTTCCAAACCT 



Gene 

Segments 



Tyros 
2 
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158/216 

Offset : 16 

1st Codon : 1 

QTSAGHFPRACVSSKNLMEKECCPPWSGDR 
CAGACAAGCGCTGGCCATTTCCCTAGGGCTTGCGTCAGCTCC 



Gene : Tyros 

Segment 8 -. 3 
Offset : 31 
1st Codon : 1 

NLMEKECCPPWSGDRSPCGQLSGRGSCQNI 
AACCTCATGGAAAAGGAATGCTGTCCCCCTTGGTCCGGCGATAG^ 



Gene 
Segments 
Offset 
1st Codon 



Tyros 
4 

46 
1 



Gene : Tyros 

Segment^ : 5 
Offset : 61 
1st Codon : 1 

LLSNA PLG PQFPFTGVDDRESWPSVFYNRT 
CIX5CTCAGCAATGCCCCTCTGGGACCCCAATTCCCTTTCACAGGCX3TC 



Gene : Tyros 

Segment S : 6 
Offset : 76 
1st Codon : 1 

VDDRESW PSVFYNRTCQCSG N F MG FNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAACCTGTCAGTGTAGCGGAAACTTTAT^ 

Gene : Tyros \ 

Segments 7 
Offset : 91 ■" ' 
1st Codon : 1 

CQCSG N FMGFNCGNCKFGFWGPNCT ERRLL 
TGCCAATGCTTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAA^ 

Gene : Tyros 

Segments : 8 
Offset : 106 

1st Codon : 1 

KFGFWG PNCTERRLLVRRN I FDLSAPEKDK 
AAGTTTGGCTTTTGGGGACCCAATTGCACAGAGAGAAGGCTCCTGGTCAGGAGAAACATTTTCGATCT 



Gene : Tyros 

Segments : 9 
Offset : 121 
1st Codon : 1 

VRRNI FDLSAPEKDKFFAYLTLAKHTISSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGC^ 

Gene : Tyros 

Segments : 10 
Offset : 136 
1st Codon : 1 

FFAYLTLAKHTI SSDYVI PIGTYGQMKNGS 
TTCTTTGCCTATCTGAO\CTGGCTAAGCATACCATTAGCTCCGACTATGTGATT<X 

Gene : Tyros 

Segments : 11 
Offset : 151 
1st Codon : 1 

YVIPIGTYGQMKNGSTPMFNDINI YDLFVW 
TACGTCATCCCTATCGGAACCTATGGCCAAATGAAAAACGGAAGCACACCCAT 



Gene : Tyros 

Segments : 12 

Offset : 166 

1st Codon : 1 
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159/216 

TPMFNDINIYDLFVWMHYYVSMDALLGGS E 



Gene 
Segment S 
Offset 
1st Codon 



Tyros 
13 
181 
1 



MHYYVSMDALLGGSEIWRDIDFAHEAPAFL 
ATGCATTACTATGTGTCCATGGATGCCCTCCTGGGAGGCTCCGAGATTTGGAGAGACATTGACTTTGCCCATGAGGCT 



Gene : Tyros 

Segment^ : 14 
Offset : 196 
1st Codon : 1 

I WRDIDFAHEAPA FLPWHRLFI#LRWEQ£I Q 
ATCTGGAGGGATATCGATTTCGCTCACGAAGCCCCTGCCTTTCTGCCTTGGCATAGGCTCTTCCTCCTG^ 



Gene : Tyros 

Segment^ : 15 
Offset : 211 

1st Codon : 1 
PWHRLFLLRWEQE 



Gene : Tyros 

Segments : 16 
Offset : 226 
1st Codon : 1 

KLTGDENFTIPYWDWRDAEKCDICTDEYMG 
AAGITTCACCGGAGACGAAAACTTTACCATTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTA 

Gene : Tyros 

Segments : 17 
Offset : 241 

1st Codon : 1 

RDAEKCDICTDEYMGGQHPTNPNLLSPAS F 
AGGGATGCCGAAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAA 

Gene : Tyros 

Segments : 18 
Offset : 256 

1st Codon : 1 

GQH PTNPNLLS PAS FFSSWQI VCSRLEEYN 
GGCCAACACCCTACCAATCCCAATCTGCTCAGCCCTGCCTCCTTCTTTAGCTCCTGGCAAATCGTCTGCTCCA 



Gene : Tyros 

Segments : 19 
Offset : 271 

1st Codon : 1 

FSS WQ I VCSRLEEYNSHQSLCNGTPEG P LR 
TTCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAAGAGTATAACTCCCACCAAAGCCTCTGCAATGGCACACCCGAAGGCCCTCTC 



Gene : Tyros 

Segments : 20 
Offset : 286 

1st Codon : 1 

SHQSL.CNGTPEGPLRRNPGNHDKSRTPRLP 
AGCCATCAGTCCCTGTGTAACGGAACCCCTCAGGGACCCCTCAGGAGAA 



Gene : Tyros 

Segments : 21 
Offset : 301 
1st Codon : 1 

RNPGNHDKSRTPRLPSSADVEFCLSLTQYE 
AGGAATCCCGGAAACCATGACAAAAGCAGAACCCCTAGGCTCCCCTCCAGCGCTGACGTCGAGTTTT 



Gene : Tyros 

Segments : 22 
Offset : 316 
1st Codon : 1 

SSADVEFCLSLTQYESGSMDKAANFSFRNT 
AGCTCCGCCGATGTGGAATTCTGTCTGTCCCTGACACAGTATGAGTCCGGCTCCATGGATAAGGCTGCCAATTT^ 
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Gene : Tyros 

Segments : 2 3 
Offset : 331 
1st Codon : 1 

SGSMDKAANFSFRNTLEGFASPLTGIADAS 
AGCGGAAGCATGGACAAAGCCGCTAACTTTAGCTTTAGGAATACCCTCGAGGGATTC^ 

Gene : Tyros 

Segments : 24 

Offset : 346 ' 
let Codon : 1 

LEGFAS PLTGIA DASQSSMHNALH IYMNGT 
CTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCX3CTGA 

Gene : Tyros 

Segment* : 25 
Offset : 361 
1st Codon : 1 

QSSMHNALH I YMNGTMSQVQGSANDPI FLL 
CAGTCCAGCATGCACAATGCCCTTCCACATTTACATGAACGGAACCATGAGCCAAGTGCAAGGCTCCGC 

Gene : Tyros 

Segments : 26 
Offset : 376 

1st Codon : 1 

MSQVQGSANDPI FLLHHAFVDS IFEQWLQR 
ATGTCCCAGGTCCAGGGAAGCGCTAACGATCCCATTTTCCTCCrGCATCACGCTTTCGTCGACTCC 

Gene : Tyros 

Segments : 2 7 

Offset : 391 

1st Codon : 1 | 

HHA FVDSIFtEQWLQRHRPLQEVYPEANAPI 
CACCATGCCTTTGTGGATAGCATTTTCGAACAGTGGCT 

s ' I 
Gene : Tyros 

Segments : 2 8 
Offset : 406 
1st Codon : 1 

HRPLQEVY PEANAP IGHNRE SYMV PFI PLY 
CACAGACCCCTCCAGGAAGTGTATCCCGAAGCCAATGCCCCTATCGGACACAATAGGGAAAGCTATATGGTCCCCTTT 

Gene : Tyros 

Segments : 29 
Offset : 421 

1st Codon : 1 

GHNRESYMVPFI PLYRNGDFFI SSKDLGYD 
GGCCATAACAGAGAGTCCrACATGGTGCCTTTCATTCCCCTCTACAGAAACGGAGACTTTTT 

Gene : Tyros 

Segments : 30 
Offset : 4 36 
1st Codon : 1 

RNGDFFISSKDLGYDYSYLQDSDPDSFQDY 
AGGAATWCGATTTCTTTATCTCCAGCAAAGACCTCGGCTATGACTATAGCTATCTGCAAGACTCCGAC 

Gene : Tyros 

Segments : 31 
Offset : 451 

1st Codon : 1 

YSYLQDSDPDSFQDYI KSYLEQASRIWSWL 
TACTCCTACCTCCAGGATAGCGATCCCGATAGCTTTCAC^ATTACATTAAGTCCTA^ 

Gene : Tyros 

Segments : 32 
Offset : 466 
1st Codon : 1 

IKSYLEQASRIWSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCTGCTCGGCGCT^ 

Gene : Tyros 
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Segments : 33 
Offset : 481 
1st Codon : 1 

LGAAMVGAVLTALLAGLVSLLCRHKRKQLP 
CTGGGAGCCGCTATGGTCGGCGCTGTGCTCACCGCTCTGCTCGCCGGACTGGTCAGCCTCCTC 

Gene : Tyros 

Segments : 34 
Offset : 496 

1st Codon : 1 

GLVSLLCRHKRK^QLPEEKQPLLMEKEDYHS 
GGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCC 

Gene : Tyros 

Segment^ : 35 
Offset : Sll 

1st Codon : 1 

EEKQPLLMEKEDYHS LYQSHLAA 
GAGGAAAAGCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGCCTCTACCAAAGCCATCTGGCTGCC 

Gene : TRP2 

Segment S : 1 
Offset : 1 
1st Codon : 1 

AAMS PLWWGFLLSCLGCKILPGAQGQF PRV 
G CCG CTATGTCCCCCCTCTGGTGGGG CTTTCrGCTCAGCTGTCTGGG ATGC AAAATC CC AATTCCCT AGGGTC 

Gene : TRP2 

Segments : 2 
Offset : 16 

1st Codon : 1 

GCKI LPGAQGQFPRVCMTVDSLVNKECCPR 
GGCIXjTAAGATTCTGCCTGGCtSCTCAGGGACAGTTTCCCAGAGTGTGT^ 

Gene : TRP2 

Segment # : 3 
Offset : 31 

1st Codon : 1 

CMTVDSLVNKECCPRLGAESANVCGSQQGR 
TGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTTGCCCTAG^CTCGGCGCTG 

Gene : TRP2 

Segments : 4 
Offset : 46 
1st Codon : 1 

LGAESANVCGSQQGRGQCTEVRADTR PWSG 
CTGGGAGC CG AAAG CXSCT AACGT CTGCGG AAGCCAACAGGG AAGGGG A CAG TG T ACCG AAGTGAGAG CCG ATAC C AG A CCCTGG AGCGG A 

Gene : TRP2 

Segments : 5 
Offset : 61 
1st Codon : 1 

GQCTEVRADTRPWSGPYI LRNQDDRELWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCGGCCCTTACA1TCTGAGAAACCAAGA 

Gene : TRP2 

Segments : 6 
Offset : 76 
1st Codon : 1 

PYI LRNQDDRELWPR K FFHRTCKCTGN FAG 
CCCTATATCCTCAGGAATCAGGATGACAGAGAGCTCTGGCCTAG^AAATTCTTC 

Gene : TRP2 

Segments : 7 
Offset : 91 

1st Codon : 1 

KFFHRTCKCTGNFAGYNCGDCKFGWTGPNC 
AAGTTTTTCCATAGGACATGCAAATGCACAGGCAATTTCGCTGGCT 

Gene : TRP2 

Segments : 8 
Offset : 106 
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1st Codon : 1 

YNCGDCKFGWTGPNCERKKPPVIRQNIHSL 
TACAATTGCGGAGACTGTAAGTTTGGCTGGACCGGACCCAATTGCGAAAGGAAAAAGCCTCC^ 

Gene : TRP2 

Segments : 9 
Offset : 121 
1st Codon : 1 

ERKKPPVI RQNIHSLSPQEREQFLGALDLA 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACATTCACTCCCTGTCCCCCCAAGAGAGAGAGCAA 

Gene : TRP2 

Segment# : 10 
Offset : 136 

1st Codon : 1 

SPQEREQFLGA LDLAKKRVH PDYVI TTQHW 
AGCCCTCAGGAAAGGGAACAGTTTCTGGGAGCCCTCGACCTCGCCAAAAAGAGAGTGC^ 

Gene : TRP2 

Segment^ : 11 
Offset : 151 
1st Codon : 1 

KKRVHPDYVI TTQHWLGLLGPNGTQPQFAN 
AAGAAAAGGGTCCACCCTGACTATGTGATTACCACACAGCATTGGCTCGGCCTCCT 

Gene : TRP2 

Segments : 12 
Offset : 166 
1st Codon : 1 

LGLLGPNGTQPQFANCSVYDFFVWLHYYSV 
CTGGGACTGCTCGGCCCTAACGGAACCCAACCCCAATTCGCTAA 

Gene : TRP2 \ 

Segments : 13 \ 
Offset : 181 
1st Codon : 1 *r \ 

CSVYDFFVWLHYYSVRDTLLGPGRPYRAID 
TGCTCCGTGTATGA C1T TT T CGTCTGGCTCCACTATTA 

Gene : TRP2 

Segments : 14 
Offset : 196 

1st Codon : 1 

RDTLLG PGRPYRAIDFSHQGPA FVTWHRYH 
AGGGATACCCTCCTGGGACCCGGAAGGCCTTACAGAGCCATTGACTTTAGCCATCAGGGACCCGCTTTCGTCACCTGGCACAGATACCAT 

Gene : TRP2 

Segments : 15 
Offset : 211 

1st Codon : 1 

FSHQGPAFVTWHRYHL.LCLERD LQRLIGNE 
TTCTCCCACCAAGGCCCTGCCTTTGTGACATGGCATAC^TATCACCTC 

Gene : TRP2 

Segments : 16 
Offset : 226 

1st Codon : 1 

ll clerdlqrligne sfalpywnfatgrne 
ctgctctgcctcgagagagacctccagagactgattggcaatgagtccttcgctctgcctc 

Gene : TRP2 

Segments : 17 
Offset : 241 

1st Codon : 1 

SFALPYWNFATGRNECDVCTDQLFGAARPD 
AGCTTTGCCCTCCCCTATTGGAATTTCGCTACCGGAAGGAATGAGTGTGACGT 

Gene : TRP2 

Segments : 18 
Offset : 256 
1st Codon : 1 

CDVCTDQLFGAARPODPTLISRNSRFSSWE 
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TGCGATGTGTGTACCGATCAGCTCITCGGAGCCGCTAGGCCTG^ 

Gene : TRP2 

Segments : 19 
Offset : 271 

1st Codon : 1 

DPTLISRNSRFSSWETVCDSLDDYNHLVTL 
GACCCTACCCTCATCTCCAGGAATAGCAGATTCTCCAGCTGGGAGACA 

Gene : TRP2 

Segments : 20 
Offset : 286 
1st Codon : 1 

TVCDSLDDYNHLVTLCNGTYEGLLRRNQMG 
ACCGTCTGCGATAGCCTCGACGATTACAATCACCTCGTGACACTC 

Gene : TRP2 

Segments : 21 
Offset : 301 
1st Codon : 1 

CNGTYEGLLRRNQMGRNSMKLPTLKDI RDC 
TGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAGAAACTCCAT^ 

Gene : TRP2 

Segments : 22 
Offset : 316 
1st Codon : 1 

RNSMKLPTLKDIRDCLSLQKFDNPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTGCCTCAGCCT 

Gene : TRP2 

Segments : 23 
Offset : 331 
1st Codon : 1 

LSLQKFDNPPFFQNSTFS FRNALEGFDKAD 
CTGTCCCTGCAAAAGTTTGACAATCCCCCI^CTTTCAGAATAGCACATTCT 

. Gene : TRP2 

Segments : 24 
Offset : 346 
1st Codon : 1 

TFSFRNALEGFDKADGTLDSQVMSLHNLVH 
ACCTTTAGCTTTAGGAATGCCCTCGAGGGATTCGATAAGGCTGACGGAACCCT 

Gene : TRP2 

Segments : 2 5 
Offset : 361 
1st Codon : 1 

GTLDS QVMS L H N L V H SFLNGTNALPHS AAN 
GGCACACTGGATAGCCAAGTGATGAGCCTCCACAATCTGGTCCACTCC^ 

Gene : TRP2 

Segments : 26 
Offset : 376 
1st Codon : 1 

SFLNGTNALPHSAANDPI FVVLHSFTDAI F 
AGCTTTCTGAATGGGACAAACGCTCTGCCTTCACTC 

Gene : TRP2 

Segments : 2 7 
Offset : 391 

1st Codon : 1 

DPI FVVLHSFTDAI FDEWMKR FNPPADAWP 
GACCCTATCTTTGTGGTCCTGCATAGCTTTACCGATGCCATTT^ 

Gene : TRP2 

Segments : 28 
Offset : 406 
1st Codon : 1 

DEWMKRFNPPADAWPQELAPIGHNRMYNMV 
GACGAATGGATGAAGAGATTCAATCCCCCTGCCGATGCCTGC^CCCAA 
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Gene : TRP2 

Segments : 29 
Offset : 421 
1st Codon : 1 

QEL.AP IGHNRMYNMVP F FPPVTNEELF LTS 
CAGGAACTGGCrCCCATTGGCCATAACAGAATGTATAACATGGTGCCTTTCTTTCCCCCTGTC 

Gene : TRP2 

Segment# : 30 
Offset : 436 
1st Codon : 1 

PFF PPVTNEELFLTSDQLGYSYAIDLPVSV 
CCCTTTTTCCCTCCCGTCACCAATGAGGAACTGTTTCTGACAAGCGATCAGCTCGGCTATAGCTATGCCATTGACCTCCCCGTCAGCGTC 

Gene : TRP2 

Segment S : 3 1 
Offset : 451 
1st Codon : 1 

•DQLGY SYAI DLPVSVE ETPGWPTTLLVVMG 
GACCAACTGGGATACTCCTACGCTATCGATCTGCCTGTGTCCGTG 

Gene : TRP2 

Segment^ : 32 
Offset : 466 
1st Codon : 1 

EETPGWPTTLLVVMGTLVALVGLFVLLAFL 
GAGGAAACCCCTGGCTGGCCCACAACCCTCCTGGTCGTGATGGGCACACTG 

Gene : TRP2 

Segments : 33 
Offset : 481 

1st Codon : 1 

TLVAL VGLF^VLLAFLQYRRL RKGYT PLM ET 



Gene 
Segments 
Offset 
1st Codon 

Q Y R R 



TRP2" ' 

34 

496 

1* 

L R K G 



LMETHLSS 



E A A 



CAGTATAGGAGACTGAGAAAGGGATACACACCCCTCATGGAAACCCATCTGTCC^GCAAAAGGTATACCGAAGAGGCTGCCGCT 



Gene 
Segments 
Offset 
1st Codon 
A A M 



MC1R 
1 
1 
1 

A V Q G 



S Q R 



L N S 



L G L A A 



Gene 
Segments 
Offset 
1st Codon 
L N S 



MC1R 

2 

16 
1 

T P T A 



QLGLAANQTG 



R C 



S I 



D G 



CTGAATAGCACACCCACAGCCATTCCCCAACTGGGACTGGCTGCCAATCAGACA 



Gene : KC1R 

Segment S : 3 
Offset : 31 
1st Codon : 1 

NQTGA RCLEVSI SDGLFLSLGLVSLVENAL 
AACCAAACCGGAGCCAGATGCCTCGAGGTCAGCATTAGCGATGGCCTCTTCCTCAGCCTCGGCCTCGTGTCCCTGGTCGAGAATGCCCTC 



Gene 
Segments 
Offset 
1st Codon 
L F L 



MC1R 
4 

46 
1 



ENALVVATIAKNRNLHS PM 
rCXSTGGCTACCATTGCCAAAAACAGAAACCTCCACTCCCCCATG 



Gene 
Segments 



MC1R 
5 
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Offset : 61 

1st Codon : 1 

VVATI AKNRNLHS PMYCF I CCLALS DL LVS 
GTGGTCGCCftO^TCGCTAAGAATAGGAATCTGCATAGCCCTATGT^ 



Gene : MC1R 

Segments : 6 
Offset : 76 
let Codon : 1 

YCFZCCLALSDLLVSGTNVXjETAVI LLLEA 
TACTGTTTC^TTTGCTGTCTGGCTCTGTCCGACCTCCrrGGTCftGCGGAACCAA 



Gene 
Segments 
Offset 
1st Codon 
G T N V 



MC1R 
7 

91 
1 

L E T A V 



GAL 



A A 



L 0 0 L D N 



GGCACAAACGTCCTGGAAACCGCTGTGATTCTGCTCCTGGA 



Gene : MC1R 

Segment* : 8 
Offset : 106 
1st Codon : 1 

GALVARAAVLQQLDNVIDVITCSSMLSSLC 
GGCGCTCTGGTCGCCAGAGCCGCTGTGCTCCAGCAACTGGATAACGTCAT 



Gene : MC1R 

Segments : 9 
Offset : 121 
1st Codon : 1 

VIDVITCSSMLSSLCFLGAIAVDRYISI FY 
GTGATTGACGTCATCACATGCTCCAGCATGCTCTCCAGCCT 



Gene : MC1R 

Segments : 10 
Offset : 136 
1st Codon : 1 

FLGAIAVDRYISI FYALRYHSIVTLPRAPR 
TTCCTCGGCGCTATCGCTGTGGATAGGTATATCTCCATCTTTTAOSCrCTGAGATACCATAGCATTGTC 



Gene 
Segments 
Offset 
1st Codon 
A L R 



MC1R 
11 
151 
1 

H S 



P R 



GCCCTCAGGTATCACTCCATCGTCACCCTCCCCAGAGCCCCTAGGGCTGTGGCTGCC 



Gene : MC1R 

Segments : 12 
Offset : 166 

1st Codon : 1 

A V A A 1 WVASVVFSTLFI AYYDHVAVLLC LV 
GCCGTCGCCGCTATCTGGGTGGCTAGCGTCGTGTTTAGCACACTGTTTATCGCTTACTATGACCATGTGG 



Gene 
Segments 
Offset 
1 st Codon 
F I A 



MC1R 

13 

181 

1 

Y Y D 



L M A V L Y V 
RTGGCTGTGCTCTACGTC 



Gene : MC1R 

Segments : 14 
Offset : 196 

1st Codon : 1 

VFFLAMLVLMAVLYVHMLARACQHAQGI AR 
GTGTTTTTCCTCGCCATGCTGGTCCrGATGGCCGTCCTTGT^ 



Gene 
Segments 
Offset 
1st Codon 



MC1R 
15 
211 
1 
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HMLARACQHAQGIARLHKRQRPVHQGF-GLK 
CAC ATGCTGGCTAGGGCTTG CCAACACG CTC AGGGAATCX3CT AGGCTCC ACAAAAGGC AAAGGCCTG TGCATC AGGG ATTCGG ACTG AAA 

Gene : MC1R 

Segment # : 16 
Offset : 226 
1st Codon : 1 

LHKRQRPVHQGFGLKGAVTLTI LLG I F F L C 
CTGCATAAGAGACAGAGACCCGTCCACCAAGGCTTTGGCCTCAAGGGAGCCGTCACCCTCACCATTCTGCT 



Gene : MC1R 

Segments : 17 
Offset : 241 

1st Codon : 1 

GAVTLT I LLG1 FFLCWGP FFLHLTL.I VLC P 
GGCGCTGTGACACTGACAATCCTCCTGGGAATCTTTTTCCTCTGCTGGGGCCCrrTT 



Gene 

Segment # 
Offset 
1st Codon 
W G P F 



MC1R 
18 
256 
1 

F L 



TGGGGACCCTTTTTC 



G C 



N L F 
ATCTGTTT 



Gene : MC1R 

Segment # : 19 
Offset : 271 

1st Codon : 1 

EHPTCGC I FKNFNLFLAL I I C N A I IDPLIY 
GAGCATCCCACATG<^GATGCATTTTCAAAAACTTTAAC 

Gene : MC1R 

Segment** : 20 \ 
Offset : 286 \ 
1st Codon : 1 

L A I* I 1.C ' NAIlIDPLIYAFHSQELRRTLKEVL 
CTGGCTCTGATTATCTGTAACGCTATCATTGACCCTCTGATTTACGCITTC 

Gene : MC1R 

Segments : 21 
Offset : 301 
1st Codon : 1 

AF-HSQELRRTLKEVLTCSWAA 
GCCTTTCACTCCCAGGAACTGAGAAGGACACTGAAAGAGGTCCTGACATGCTCCTGGGCTGCC 

Gene : MUC1F 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AAMT PGT QS PFFLLLLLTVLTVVTGSGHAS 
GCCGCTATGACACCCGGAACCCAAAGCCCTTTCTTTCTGCTCCTGCT 

Gene : MUC1F 

Segment** : 2 
Offset : 16 
lat Codon : 1 

LLTVLT VVTGSGHASSTPGGEKETSATQRS 
CTGCTCACCGTCCTGACAGTGGTC^CCGGAAGCGGACACGCrAGCTCCACCCCTGGCGGAGAGAAAGAG 

Gene : MUC1F 

Segments : 3 
Offset : 31 
1st Codon : 1 

STPGGEKETSATOR SSVPSSTEKNAVSMTS 
AGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACAGAGAAGCTCCGTGCCTAGCTCCACCGAAAAGAATGCCGTCAGCATGACC^ 

Gene : MUC1F 

Segment # : 4 
Offset : 46 
1st Codon : l 

SVPSSTEKNAVSMTSSVLS SHSPGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACGCTGTGTCCATGACAAGCTCCGTGCTCAGCTCCCACTCC 
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Gene = MUC1F 

Segment^ : 5 
Offset : 61 
let Codon : 1 

SVLSSHSPGSGSSTTQGQDVTLAPATEPAS 
AGCGTCCTGTCCAGCCATAGCCCTGGCTCCGGCTCCAGCACAACCCAAG^ 

Gene : MUC1F 

Segment^ : 6 
Offset : 76 

1st Codon : 1 

QGQDVTLAPAT EPASGSAATWG QDVTS V P V 
CAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCTAGCGGAAGCGCTGCCACATGGGGACAGGATG 

Gene : MUC1F 

Segments : 7 
Offset : 91 
1st Codon : 1 

GSAATWGQDVTSV PVTRPALGSTT P PAHDV 
GGCTCCGCCGCTACCTGGGGCCAAGACGTCACCTCCGTGCCTGTG 

Gene : MUC1F 

Segments : 8 
Offset : 106 
1st Codon : 1 

TRPALGS TTPPAHDVTSAPDNKAA 
ACCAGACCCGCTCTGGGAAGCACAACCCCTCCCGCTCACGATGTGAC^GCGCTCCCGATAACAAAGCCGCT 

Gene : MUC1R 

Segments : 1 
Offset : 1 
1st Codon : 1 

AANRPALGSTAPPVHNVTSASGSASGSAST 
GCCXSCTAACAGACCCGCTCrGGGAAGCACAGCCCCTCCCGTCCACAATGTGACAAGCGCTAGCGGAAGCGCTAG 

Gene : MUClR 

Segments : 2 
Offset : 16 
1st Codon : 1 

NVTSASG SASG SASTLVHNGTSARATTT PA 
AACGTCACCTCCGCCTCCGGCTCCGCCTCCGGCTCCGCCTCCACCCTCGTGCATAACGGAACCTCCGCCAGAGCCAC^ 

Gene : MUC1R 

Segments : 3 
Offset : 31 
1st Codon : 1 

LVHNGTSARATTTPASKSTPFSI PSHHSDT 
CTGGTCCACAATGGCACAAG03CTAGGGCTACCACAACCCCTGC 

Gene : MUClR 

Segments : 4 
Offset : 46 

1st Codon : i 

SKSTPFSIPSHHSDTPTTLASHSTKTDASS 
AGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGATACCCCTACCACACTC^CTAGCCATAGCACAAAGACAGACGCTA 

Gene : MUC1R 

Segments : 5 
Offset : 61 
1st Codon : 1 

PTTLASHSTKTDAS STHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCCACTCCACCAAAACCGATGCCTCC^GCACACACCATAGCTCCGTGCCrCC 

Gene : MUClR 

Segments : 6 
Offset : 76 
1st Codon : 1 

THHSSVP PLTS SNHSTS PQLSTGVS F FFLS 
ACCCATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAG 

Gene : MUClR 
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Segments : 7 ! 
Offset : 91 
1st Codon : 1 

TSPQLSTGVSFFFLSFHI SNLQFNSSLEDP 
ACCTCCCCCCAACTGTCCACCGGAGTGTCCTTCTrTTTCCTCAGCTTTCACATTAGCAATCTGCAATTCAATAGCT 

Gene : MUC1R 

Segment** : 8 
Offset : 106 
1st Codon : 1 

FHISNLQFNSSLEDPSTDYYQELQRDISEM 
TTCCATATCTCCAACCTCCAGTTTAACTCCAGCCTCGAGGATCCCTCCACCGATTACTATCAGGAACTGCA 

Gene : MUClR 

Segment* : 9 
Offset : 121 
1st Codon : 1 

STDYYQELQR DI S EMFLQI YKQGG FLGLSN 
AGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGCGAAATGTTTCTGCAAATCTATAAGCAAGGCGGATTCCT 

Gene : MUC1R 

Segments : 10 
Offset : 136 

1st Codon : 1 

FLQIYKQGGFLGLSNIKFR PGSVVVQLTL.A 
TTCCTCCAGATTTACAAACAGGGAGGCTTTCTGGGACTGTCCAACATTAAGTTTAGGCCTC 

Gene : MUC1R 

Segments : 11 
Offset : 151 
1st Codon : 1 

IKFRPGSVVVQLTLAFREGTINVHDVETQF 
ATCAAATTCAGACCCGGAAGCGTCGTGGTCCAGCTCACCCTCGCCTTTAGG^ 

Gene : MUC1R 

Segments : 12 I 

Offset : 166 

1st Codon : 1 

FREGTI NVHDVETQFNQYKTEAASRYNLT I 
TTCAGAGAGGGAACCATTAACGTCCACGATGTGGAAACCCAATTCAATCAGTATAAGAC^ 

Gene : MUC1R 

Segments : 13 
Offset : 181 
1st Codon : 1 

NQYKTEAASRYNLTI SDVSVSDVPFPFSAQ 
AACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTTGACAATCTCCGACGTCAGCXaTC 



PFPFSAQSGAGVPGWGIALLVL 
'CCTTTCCCTTTAGCGCTCAGTCCGGCGCTGGCGTCCC^TGGATGGGGAATCGCTCTGCTCGTGCT 

Gene : MUC1R 

Segments : 15 
Offset : 211 
1st Codon : 1 

SGAGVPGWGIALLVLVCVLVALAIVYLIAL 
AGCGGAGCCGGAGTGCCTGGCTGGGGCATTGCCCTCCTGGTCCTGGTCTGCGTC 

Gene : MUClR 

Segment S : 1 6 
Offset : 226 
1st Codon : 1 

VCVLVALAIVYL1ALAVCQCRRKNYGQLDI 
GTGTGTGTGCTCGT^CTCTGGCTATCGTCTACCTCATCGCTCTGGCTGTGTGTCAGTG 

Gene : MUClR 

Segments : 17 
Offset : 241 



Gene 

Segment S 
Offset 
1st Codon 
S D V 



MUClR 
14 
196 
1 

V s 



D V 
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1st Codon : 1 

AVCQCRRKNYGQLDIFPARDTYHPMSEYPT 
GCCGTCTGCCAATGCAGAAGGAAAAACTATGGCCAACTGGATATCTTTCCCGC™ 

Gene : MUC1R 

Segment^ : 18 
Offset : 256 
1st Codon : 1 

FPARDTYHPMSEYPTYHTHGRYVPPSSTDR 
TTCCCTGCCAGAGACACATACCATCCCATGAGCGAATACCCTACCTATCAC^CACACGGAAGGT 

» 

Gene : MUC1R 

Segment 8 ; 1 9 
Offset : 271 

1st Codon : 1 

YHTHGRYVPPSSTDRSPYEKVSAGNGGSSL 
TACCATACCCATGGCAGATACGTCCCCCCTAGCrCCACCXSATAGGTCCCCCTATGAGAAAGTGTCCGCCGGAAACGGAGGCTCCAGCCTC 

Gene : MUC1R 

Segments : 20 
Offset : 286 
1st Codon : 1 

S P YEKV SAGNGGSSLSYTN PAVAAA S A N L A 
AGCCCTTACGAAAAGGTCAGOSCTGGCAATGGCGGAAGCTCCCTGTC 

Gene : MUC1R 

Segments : 21 
Offset : 301 

1st Codon : 1 

SYTNPAVAAASANLAA 
AGCTATACCAATCCCGCTGTGGCTGCCGCTAGCGCTAACCTCGCCGCT 

Segments in scrambled order: 



gplOO #4 

WNRQLYPEWTEAQRLDCWRGGQV. SLKVSND 
TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAG^ 

TRP2 #6 

PYILRNQDDRELWPRKFFHRTCKCTGNFAG 
CCCTATATCCTCAC^AATCAGGATGACAGAGAGCTCTGGCCTAGGAAATTCTTTCACAGAACCTGT 

Tyros 830 

RNGDFFI SSKDLGYDYSYLQDSDPDSFQDY 
AGGAATGGCGATTTCTTTATCTTCCAGCAAAGACCTCGGCTATGACTATAGCTATCTGCA 

TRP-1 #1 

AA PA FLTWHRYHL*LR LE KDMQEMLQ EPSFS 
GCCGCTCCCGCTTTCCTCACCTGGCACAGATACCATCTGCTCAGGCTCGAGAAAGACATGCAGGAAATC 

Tyros #29 

GHNRESYMVPFIPLYRNGDFFI SSKDLGYD 
GGCCATAACAGAGAGTCCTACATGGTGCCTTTCATTCCCCTCTACAGAAACGGAGACTTTTTCATTAGCT 

TRP2 #16 

LLCLERDLQRLIGNESFALPYWNFATGRNE 
CITICTCTGCCTCGAGAGAGACCTCCAGAGACTGATTCGCAATGAGT^ 

gplOO #23 

TTEVVGTTPGQAPTA EPSGTTSVQV PTTEV 
ACCACAGAGGTCGTGGGAACCACACCCGGACAG^CTCCCACAGCCGAACCCTCCGGCACAACCTCCGTGCAA 

MUC1R #9 

STDYYQELQRDISEM FLQI YKQGGFLGLSN 
AGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGCGAAATGTTTCTGCAAATCrATAAGCAAGGCC^ 

gplOO #36 

ACMEI SS PGCQPPAQRLCQPVLPSPACQLV 
GCCTGTATGGAAATCTCCAGCCCTGGCrrGTCAGCCTCCC<XrrCAGAGACTC 

TRP2 #31 

DQLGYSYAI DLPV SV EETPGWPTTL LVVMG 
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GACCAACTCGGATACTCCTACGCTATCGATCTCX:CIX;T^ 
TRP-1 #7 

TEDGPIRRNPAGNVARPMVQRLPEPQDVAQ 
ACCGAAGACGGACCCATTAGGAGAAACCCTGCCGGAAACGTCGCCAGACCCATGGTGCAAAGGCTCCCCGAACCCCAAGACGTCGCCCAA 

TRP2 #3 

CMTVDSLVNKECC PRLGAESANVCGSQQGR 
TGCATGACCGTCGACTCCCTGGTCAACAAAGAGTGTTGCCCTAGGCTCGGCGCTGAGTCCGCCAATGTGTGTGGCTCCCAGCAAGGCAGA 

MUC1R #13 

NQYKTEAAS RYN LTI SDVSVS DV P FPFSAQ 
AACCAATACAAAACCGAAGCCGCTAGCAGATACAATCTGACAATCTCCGACGTCAGCGTCAGCGATGTGCCTTTCCCTrrCTCCGCCCAA 

TRP2 #1 

AAMS P LWWG FL LS CLG CKI LPGAQG QF PR V 
GCCGCTATGTCCCCCCTCTGGTGGGGCTTTCTGCTCAGCTGTCTGGGATGCAAAATCCTCCCCGGAGCCCAA 

gplOO #18 

ADLSYTWDF GDSSGTLISRALVVTHTYLEP 
GCCGATCTGTCCTACACATGGGATTTCGGAGACTCCAGCGGAACCCTCATCTCCAGGGCTCTGGTCX3TGACACACACATAC 

gplOO #27 

LAEMSTPEATGMT pAEVSIVVLSGTTAAQV 
CTGGCTGAGATGAGCACACCCGAAGCCACAGGCATGACCCCTGCCGAAGTGTCCATC 

MUC1R #11 

I KFRPGSVVVQLTLAPREGT.I NVHDVET Q F 
ATCAAATTCAGACCCGGAAGCGTCGTGGTCCAGCTGACCCTCGCCTTTAGGGAAGGCACAATCAATGTGCATGACGTCGAGACACAGTTT 

MUC1F #7 

G S AATWGQDVT S V PVTR PAL G ST T P PAHDV 
GGCTCCGCCGCTACCTGGGGCCAAGACGTCACCTCCGTGCCTGTGACAAGGCCTGCCCTCGGCTC 

MC1R #16 

LHKRQiRPVH'QGFGLKGAVTLTI LLG I FFLC 
CTGCATAAGAGACAGAGACCCGTCCACCAAGGCTTTGGCCTCAAGGGAGCCGTCACCCTCACCATTCTCCTCGGCATTTTCTTTCTG 

MC1R #20 

LALI I CNAI IDPLIYAFHSQELRRTL KEVL 
CTGGCTCTGATTATCTGTAACGCTATCATTGACCCTCTGATTTACGCTTTCCATAGCCAAGAGCTCAGGAGAACCCTC 

TRP2 #7 

KFFHRTCKCTGNFAGYNCGDCKFGWTGPNC 
AAGTTTTTCCATAGGACATGCAAATGCACAGGCAATTTCGCTGGCTATAACTGTGGCGATTGCA 

TRP2 #23 

LS LQKFDNP PF FQNSTFSFRNALEG F DKAD 
CTGTCCCTGCAAAAGTTTGACAATCCCCCTTTCTTT 

MUC1R #4 

SKSTPFSIPSHHSDTPTTLASHSTKTDASS 
AGCAAAAGCACACCCTTTAGCATTCCCTCCCACCATAGCGATACCCCTACCACACTGGCTAGGCATAGCACAAAGACAGACGCTAGCTCC 



MUC1R #1 
A A N 



P A L G S T A 



H N 



G S 



TRP2 #21 

CNGTYEGLLRRNQMGRNSMKLPTLKDIRDC 
TGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAGAAACTCCATGAAACTGCCTACCCTCAAGGATATCAGAGACTGT 

MUC1R #6 

THHSSVPPLTSSNHSTSPQLSTGVS FFFLS 
ACCCATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAGCACAAGCCCTCAGCTCAGCACAGGCGTCAGCTTTTTCTTTCTGTGC 



MC1R #13 

FIAYYDHVAVLLC LVVFFLAMLVLMAVLYV 
TTCATTGCCTATTACGATCACGTCGCCGTCCTGCTCTGCCTCGTGGTCTTCTTTC 



Tyros #16 

KL.TGDENFTI PYWDWRDAEKCDI CTDEYMG 
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AAGCTCACCGGAGACGAAAACTTTACC^TTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTACCGATGAGTATATGGGA 
gplOO #3 2 

LRL.VKRQVPLDCVLYRYGS FSVTLDIVQG I 
CTG AGACTGGTC AAG AGACAGGTCCCCCTCGACTGTGTGCTCTACAG ATA CGG AAG CTTT AGCGTCACCCTCGACATTGTGCAAGGCATT 

MUC1R #10 

FLQI YKQGGFL.GLSNI KFRPGSVVVQLTLA 
TTCCTCCAGATTTACAAACAGGGAGGCTTTCTGGGACTGTCCAACATTAAGTTTAGGCCT^ 

MC1R #9 1 

VIDVITCSSMLSSLCFLGAIAVDRYIS IFY 
GTGATTGACGTCATCACATGCTCCAGCATGCTCTCCAGCCTCTGCTTTCTGGGAGC 

Tyros #21 

RNPGNHDKSRTPRLPSSADVEFCLSLTQYE 
AGGAATCCO^AAACCATGACAAAAGCAGAACCCCTAGGCTCCCCTCCAGCGCTGACGTCGAGTTTTC 

TRP-1 #14 

FDEWLRRYNADI STFPLENAP IGHNRQYNM 
TTCGATGAGTGGCTGAGAAGGTATAACGCTGACATTAGCACATTCCCTCTGGAAAAC 

gplOO #39 

VSLADTNSLAVVSTQLIMPGQEAGLGQVPt, 
GTGTCCCTGGCTGACACAAACTCCCTGGCTGTGGTCAGCACACAGCTCATCATGCCCG^ 

gplOO #20 

GPVTAQVVLQAAI PLTSCGSSPVPGTTDGH 
GGCCCTGTGACAGCCCAAGTGGTCCTGCAAGCCGCTATCCCTCTGACAAGCTGTGGCTCC 

Tyros #8 

KFGFWGPNCTERRLLVRRNI FDLSAPEKDK 
AAGTTTGGCTTTTGGGGACCCAATTGCACAGAGAGAAGGCTCCTGGTCAGGAGAAACATTTTCGATCTGTC 

gplOO #13 

LGTHTMEVTVYHRRGSRSYVPLAHSSSAFT 
CTGGGAACCCATACCATGGAGGTCACCGTCTACCATAGGAGAGGCTCCAGGTCCTACGTCCCCCTCGCCCATAGCTCCAGCGCTTTC^ 

MC1R #12 

A V A A I WVASVVFSTLFIAYYDHVAVLLCLV 
GCCGTCGCCGCTATCTGGGTGGCTAGCGTCGTGTTTAGCAC^CTGTTTO 

TRP2 #25 

GTLDSQVMSLHNLVHSFLNGTNALPHSAAN 
GGCACACTGGATAGCCAAGTGATGAGCCTCCACAATCTGGTCCACTCCITCCT 

MART #4 

GCWYCRRRNGYRALMDKSLHVGTQCALTRR 
GGCTGTItK3TA7TGCAGAAGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGGGAACC 

Tyros #15 

PWHRL.FLLRWEQEIQKLTGDENFTI PYWDW 
CCCTGGCACAGACTGTTTCTGCTCAGGTGGGAGCAAGAGATTCAGAAACTGACAGGCGATGAGAATTT 

MC1R #1 

AAMAVQGSQRRLLGSLNSTPTAI P Q L G L A A 
GCCGCTATGGCTGTGCAAGGCTCCCAGAGAAGGCTCCTGGGAAGCCTCAACTCCACCCCTACCGCTATCCCT 

MC1R #5 

VVATI AKNRNLHS PMYCFI CCLALS DLLVS 
GTGGTCGCCACAATCGCTAAG AAT AGG AATCTGCATAGCCCTATGTA'l"! "GCJTTATCTGTTG CCTCGCCCTCAGCG ATCTG CTCGTGTCC 

Tyros #25 

QSSMHNALH IYMNGTMSQVQGSANDPI F L Li 
CAGTCCAGCATGCACAATGCCCTCCACATTTACATGAACGGAACC^ 

Tyros #18 

G Q H PTNPNLLSPASFFSSWQ IVCSRLEEYN 
GGCCAACACCCTACCAATCCCAATCTGCTCAGCCCTGCCTCCTTCTTTAGCTCCTGGCAAATCGTCTC 

MC1R #6 

YCFICCLALSDLLVSGTNVLETAVILLLEA 
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TACTGTTTCATTTGCTGTCTGGCTCTGTCCGACCTCCTGGTCAGCGGAACCAATGTGCTCGAGACAGCC 
TRP2 #19 

DPTLI SRNSRFSSWETVCDSLDDYNH LVTL 
GACCCTACCCTCATCTCCAGGAATAGCAGATTCTCCAGCTGGGAGACAGTGTGTGACTCCCTGGATGACTATAACCATC 

mucif na 

TR PA LGS TT P PAHDVT SA PD NKAA 
ACCAGACCCGCTCTGGGAAGCACAACCCCTCCCGCTCACGATGTGACAAGCGCTCCCGATAACAAAGCCGCT 

Tyros #17 

RDAEKCDI CTDEYMGGQHPTNPNLLS PASF 
AGGGATGCCGAAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAACCTCCTGTCCCCCGCTAGCTTT 

gplOO #17 

TFALQLHDPSGYLAEADLSYTWDFGDS SGT 
ACCTTTGCCCTCCAGCTCCACGATCCCTCCGGCTATCTGGCrrcAGGCTGACCTCAGCTATACCT 

Tyros #22 

SSADVEFCLSLTQYESGSMDKAANFSFRNT 
AGCTCCGCCGATGTGGAATTCTGTCTGTCCCTGACACAGTATGAGTCCGGCTCCATGGATAAGGCTGCCA^ 

gplOO #6 

GPTLIGANASFS IALNFPGSQKVLPDGQVI 
GGCCCTACCCTCATCGGAGCCAATGCCTCCTTCTCCATCGCTCTGAATTTCCCTC 

MC1R #18 

WGPFFLHLTLI VLCPEHPTCGCI FKNFNLF 
TGGGGACCCTTTTTCCTCCACCTCACCCTCATCGTCCTGTGTCCCGAACACCCTAC^ 

Tyros #7 

CQCSGNFMGFNCGNCKFGFWGPNCTERRLL 
TGCCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAAATTCGGATTCTGGG^ 

TRP2 #34 ! 

QYRRLR. KGYT PLMETHLSSKRYTEEAAA 
CAGTATAGGAGACTGAGAAAGGGATACACACCCCTCATGGAAACCCATCTGTCCAGCAAAAGGTATACCGAAGAGGCTGCCGCT 

TRP-1 #1S 

PLENAPI GHNRQYNMV PFW P PVTNTEM FVT 
CCCCTCGAGAATGCCCCTATCX^ACACAATAGGCAATACAATATGGTCCCCTTTTGGCCTCCCGTCACCAATA 

gplOO #7 

NFPGSQKVLPDGQVIWVNNTIINGSQV WGG 
AACTTTCCCGGAAGCCAAAAGGTCCTGCCTGACGGACAGGTCATCTGGGTGAATAACACAATCATTAACGGAAGCCAAG 

gplOO #22 

RPTAEAPNTTAGQVPTTEVV GTTP GQ A PTA 
AGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACCGAAGTGGTCGGCACAACCCCTG 

MUCIF #3 

STPGGEKETSATQRSSVPSSTEKNAVS MT S 
AGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACAGAGAAGCTCCGTGCCTAGCTGCACCGAAAAGAATGCCGTO 

gplOO #42 

LIYRRRLMKQDFSVPQLPHSSSHWLRL, PRI 
CTGATTTACAGAAGGAGACTGATGAAGCAAGACTTTAGCGTCCCCCAACTGCCTCACTCGAGCTCCCACTGGCTG 

TRP2 #12 

LGLLGPNGTQPQFANCSVYDFFVWLHY YSV 
CTGGG ACTGCTCGG CCCTAACGGAAGCCAACCCCAATTCGCT AACTGT AGCGTCTACG ATTTCTTTGTGTGG CTGCATTA CT ATAGCGTC 

TRP-1 #9 

CLEVGLPDTPPFYSNSTNS F RNTVEGY SDP 
TGCCTCGAGGTCGGCCTCTTCGATACCCCTCCCTTTTACTCCAACTCCACCAATAGCTTTAGGAATACCGTCGAGG 

gplOO #1 

AAMDLVLKRCLLHLAVIGALLAVGATK VpR 



MC1R #3 

NQTGARCLEVSI SDGLFL SLGL 
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Tyros #23 

SGSMDKAANFSFRNTLEGFASPLTGIADAS 
AGCGGAAGCATGGACAAAGCCGCTAACTTTAGCTTTAGGAA^ 

Tyros #4 

SPCGQLSGRGSCQNILLSNAPLGPQF P F T G 
AGCCCTTGCGGACAGCTCAGCGGAAGGGGAAGCTGTCAGAATATCCTCCTC 

Tyros #13 

MHYYVSMDALLGGSEI WRDI D F AHEAP A F L 
ATGCATTACTATGTGTCCATGGATGCCCTCCTGGGAGGCTCCGAGATTTGGAGAGACATTGACTTTGC 

Tyros #35 

EEKQPLLMEKEDYHSLYQSHLAA 
GAGGAJU^GCAACCCCTCCTGATGGAGAAAGAGGATTACCATAGCCT 

TRP2 #5 

GQCTEVRADTRPWSGPYI LRNQDDRELWPR 
GGCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTCCTC 

MUC1F #4 

SVPSSTEKNAVSMTSSVLSSHSPGSGSSTT 
AGCGTCCCCTCCAGCACAGAGAAAAACGCTGTGTCCATGACAAGCTCCGTGCTCAGCTCCCACTCCCCCGGAAG 

Tyros #12 

TPMFNDINI YDLFVWMHYYVSMDALLGGSE 
ACCCCTATGTTTAACGATATCAATATCTATGACCTCTTCGTCTGGATGCACTATTACGTCAGCATG^ 

gplOO #9 

QPVYPQETDDACIFPDGG PCPSGSWSQKRS 
CAGCCTGTGTATCCCCAAGAGACAGACXSATGCCTGTATCTTTCCCGATGGCGGACCCTGTCCCTCCGGCT 

TRP-1 86 

DSLEDYDTLGTLCNSTEDGP I RRNPAGNVA 
GACTCCCTGGAAGACTATGACACACTGGGAACCCTCTGCAATAGCACAGAGGA 

gplOO #8 

WVNNTI INGSQVWGGQPVYPQETDDACI FP 
TGGGTCAACAATACCATTATCAATGGCTCCCAGGTCTGGGGAGGCCAACCCGTCTACCCTCAGGAAACCGATGAC 

MART #7 

QEKNCEPVVPNAPPAYEKLSAEQSPPPYSP 
CAGGAAAAGAATTGCGAACCCGTCGTGCCTAACGCTCCCCCTGCCTATGAGAAACTGTCCGCCGAACAGTCCCCCCCTCCCTATAGCCCT 

gplOO #14 

SRSYVPLAHSSSAFTITDQVPFSVSVSQLR 
AGCAGAAGCTATGTCCCTCTGGCTCACTCCAGCTCCGCCTTTACCAT^ 

TRP-1 #2 

LEKDMQEMLQEPSFSLPYWNFATGKNVCDI 
CIXK5AAAAGGATATGCAAGAGATGCTGCAAGAGCCTAGCTTTAGCCTCCCCTATTG 

TRP-1 #16 

VPFWPPVTNTEMFVTAPDNLGYTYEAA 
GTGCCTTTCTGGCCCCCTGTGACAAAC^CAGAGATGTTCGTCACCGCTCCCGATAACCTCGG 

TRP2 #13 

CSVYDFFVWLHYYSVRDTLLGPGRPYRAID 
TGCTCCGTGTATGACTTTTTCGTCTGGCTCCACTATTACTCCGTGAGAGACACACTG 

Tyros #9 

VRRNIFDLSAPEKDKFFAYLTLAKHTI SSD 
GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGCTTACCTCA 

MART #2 

KKGHGHSYTTAEEAAGIG I LTVI LGVLLLI 
AAGAAAGGCCATGGCCATAGCTATACCACAGCCGAAGAGGCTGCCGGAATCGGAATCCTCACCGTCATCCTCGGCGTC 

gplOO #11 

FVYVWKTWGQYWQVLGGPVSGLS IGTGRAM 
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TTCGTCTACGTCTGGAAAAC<TTGGGGCCAATACTGGCAGGTCCTGGGAGGCCCTGT€TCCGGCCTCAG 
gplOO #12 

GGPVSGLSIGTGRAMLGTHTMEVTVYHRR-G 
GG<X^ACCCGTCAGCGGACTGTCCATCGGAACCGGAAGGGCTATGCTCGGCACACACACAATGGAAGTGACAG 

gplOO #25 

ISTAPVQMPTAESTGMTPEKVPVSEVMGTT 
ATCTCCACCGCTCCCGTCCAGATGCCCACAGCCGAAAGCACAGGCATGACCCCTGAGAAAGTGCCTGTGTCCGAGGTCATGGGAACCACA 

Tyros #19 

FSSWQIVCSRLEEYNSHQSLCNGTPEGPLR 
TTCTCCAGCTGGCAGATTGTGTGTAGCAGACTGGAAGAGTATAACTCCCACCAAAGCCTCTGCAATGGCACACCCGAAG 

TRP2 #27 

DPI FVVLHSFTDAI FDEWMKRFN PPADAWP 
GACCCTATCTTTGTGGTCCTGCATAGCTTTACCGATGCCATTTTCGATGAGTGGATGAAAAGGTTTAACCCTC 

MC1R #15 

HMLARACQHAQGIARLH KRQR PVHQ GFGLK 
CACATGCTGGCTAGGGCTTGCCAACACGCTCAGGGAATCGCrAGGCTCCACAAAAGGC 

MUC1F #2 

LLTVLTVVTGSGHASST PGGE KET SATQRS 
CTGCTCACCGTCCTGACAGTGGTCACCGGAAGCGGACACGCTAGCTCCACCCCTGGCGGAGAGAAAGAGACAAGCG 

gplOO #44 

FCS CP I GENS PLLS GQQVAA 
TTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAACAGGTCGCCGCT 

TRP2 #24 

TFSFRNALEGFDKADGTLDSQVMSLHNLVH 
ACCTTTAGCTTTAGGAATGCCCTCGAGGGATTCGATAAGGCTGACGGAAC^ 

Tyros #20 ^ 

SHQSLC-NGTPEGPLRRN PGNHDKSRTPRLP 
AGCCATCAGTCCCTGTGTAACGGAACCCCTGAGGGACCCCTCAGGAGAAACCCTGGCAATCACGATAAGTCCAGGAC 

TRP2 #30 

PFFPPVTNEELFLTSDQLGYSYA 1DLPVSV 
CCCTTTTTCCCTCCCGTCACCAATGAGGAACTGTTTCTGACAAGCGATCAGCTCGGCTATAGCTATGCCATTGA 

TRP2 #9 

ERKKPPVI RQNIHSLSPQEREQF L G A L D L A 
GAGAGAAAGAAACCCCCTGTGATTAGGCAAAACATTCACTCCCTGTCCCCCCAA 

TRP2 #29 

QELAP I GHNRMYNMVPFFP PVTNEEL .FLTS 
CAGGAACTGGCTCCCATTGGCCATAACAGAATGTATAACATGGTGCCTTTCTTTCCCC^ 

gplOO #28 

EVS I VVLSGTTAAQVTTTEWVETTA RE LP I 
GAGGTCAGCATTGTGGTCCTGTCCGGCACAACCGCTGCCCAAGTGACAACCACAGAGTGGGTGGAAACCACAGCCAGAGAGCTCCCCATT 

MUC1R #7 

TSPQLSTGVSFFFLSFHISNLQFNSSLEDP 
ACCTCCCCCCAACTGTCCACCGGAGTGTCCTTCTTTTTCCrCA 

MUC1R #19 

YHTHGRYVPPSSTDRSPYEKVSAGN GGSSL 
TACCATACCCATGGCAGATACX3TCCCCCCTAGCTCCACCGATAGGTCCCGCTATGAGAAAGTGTCCGCCGGAAACGGAGGCTCCAGCCTC 

MC1R #4 

LFLSLGLVSLVENALVVATIAKNRNLHSPM 
CTGTTTCTGTCCCTGGGACTGGTCAGCCTCGTGGAAAACGCT^ 

TRP2 #26 

SFLNGTNALPHSAANDPI FVVLHSFTDAI F 
AGCTTTCTGAATGGCACAAA<X3CTCTGCCTCACTCCGCCGCTA 

MUC1R #17 

AVCQ'-CRRKNYGQLDIFPARDTY-HPMSEYPT 
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GCCGTCTGCCAATGCAGAAGGAAAAACTATGGCCAACTGG 
MC1R #14 

VFFLAMLVLMAVLYVHMLARACQHAQGIAR 
GTGTTTTTCCTCGCCATGCTGGTCCTGATGGCCGTCCTGTATGTC 

TRP-1 #10 

STN SFRNTV EGYS D PTG KY DPAVRSLHNLA 
AGCACAAACTCCTTCAGAAACACAGTGG AAGGCTATAG CGATCCCACAGGCAAAT ACG ATCCCGCTG TGAG AAGCCTCCACAATCTGGCT 

TRP-1 #3 

LPYWNFATGKNVCDI CTDDIiMGSRSNFDST 
CTGCCTTACTGGAACTTTGCCACAGGCAAAAACGTCTG 

gplOO #15 

ITDQVPFSVSVSQLRALDGGNKHFbRNQPL 
ATCACAGACCAAGTGCCTITTCTCCGTGTCCGTGTCCCAGCTCAGGGCTCTGGATGGCGGAAACAAACA 

MUC1R #8" 

FHI SNLQFNSSLEDPSTDYYQELQRDI SEM 
TTCCATATCTCCAACCTCCAGTTTAACTCCAGCCTCGAGG 

MUC1R #20 

S PY EKVSAGNGGS S LSYTN PAVAAASANLA 
AGCCCTTACGAAAAGGTCAGCGCTGGCAATGGCGGAAGCTCCCTGTCCTACACAAACCCTGCCGT 

Tyros #11 

YVI PIGTYGQMKNGSTPMFNDINIYDLFVW 
TACGTCATCCCTATCGGAACCTATGGCCAAATGAAAAACGGAAGCACACCCATGT^ 

gplOO #37 

RLCQPVLPS PACQLVLHQT LKGGSGTYCLN 
AGGCTCTGCCAACCCGTCCTGCCTAGCCCTGCCTGTCAGCTCGTGCTC^ 

gplOO #33 

RYGSFSVTLDIVQGIESAEILQAVPSGEGD 
AGGTATGGCTCCTTCTCCGTGACACTGGATATCGTCCAGGGAATCGAAAGCGCTGAGATTCTGCAAGCCGTCCCCTCCGGCGAAGGCGAT 

Tyros #27 

HHA FVDS IFEQWLQRHRPLQEVY PEANAPI 
CACCATGCCTTTTGTGGATAGCATTTrCGAACAGTGGCTGCAAAGGCATAGGCCTCTGCAAGAGGTCTACC 

TRP-1 #4 

CTDDLMGSR SNFDSTLI SPNSVFSQWRVVC 
TGCACAGACGATCTGATGGGCTCCAGGTCCAACTTTGACTCCACCCTCATCTCCCCCAATAGCGTCTTCT 

MUC1R #18 

F PARDTYHPMS EY PTYHTHGRYVPPS STDR 
TTCCCTGCCAGAGACACATACCATCCCATGAGCGAATACCCTACCTATCACACAC^ 

MUC1R #21 

SYTNPAVAAASANLAA 
AGCTATACCAAT CC CGCTGTGGCTGCCGCT AGCGCTAACCTCG CCGCT 

MC1R #19 

EHPTCGCIFKNFNLFLALI ICNAIIDPLIY 
GAGCATCCCACATGCGGATGCATTTTCAAA^CTTTAACCTCTTCCTCGCCCTCATCATTTC 

Tyros #26 

MSQVQGSANDPI FLLHHAFVDS I FEQWLQR 
ATGTCCCAGGTCCAGGGAAGCGCTAACGATCCCATTTTCCTCCTGCATCACGCTTTCGTCGACTC 

TRP2 #22 

RNSMKLPTLXDIRDCLSLQKFDNPPFFQNS 
AGGAATAGCATGAAGCTCCCCACACTGAAAGACATTAGGGATTGCCTCAGCCTCCAGAAATTCGATAACCCT 

gplOO #19 

LISRALVVTHTYLEPGPVTAQVVLQAAI PL 
CTGATTAGCAGAGCCCTCGTGGTCACCCATACCTATCTGGAACCCGGACCCGTCACCGCTCAGGTCGTG 

TRP2 #17 

S FALPYWNFATGRNECDVCTDQLFGAARPD 
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AGCTTTGCCCTCCCCTATTGGAATTTCGCTACCGGAAG<3AATGAGTGTGACGTCTGCACAGACCAAC^ 
gplOO #2 

VIGALLAVGATKVPRNQDWLGVSRQLRTKA 
GTGATTGGCGCTCTGCTCGCCGTCGGCGCTACCAAAGTGCCTAGGAATCAGGATTGGCT 

gplOO #16 

ALDGGNKHFLRNQPLTFALQLHDPSGYLAE 
GCCCTCGACGGAGGa^TAAGCATTTCCTCAGGAATCAGCCTCTGACATTCGCTCTGCAACTGCATGACCC^ 

TRP2 #18 

CDVCTDQL FGAAR PDDPTL I SRNSRFSSWE 
TGCGATGTGTGTACCGATCAGCTCTTOTGAGCCGCTAGGCCTGACGATCCaiCACTGATTAGCAGAAACTC 

MART #1 

AAMPREDAHFIYGYPKKGHGHSYTTAEEAA 
GCCGCTATGCCTAGGGAAGACGCTCACTTTATCTATGGCTATCCCAAAAAGGGACACGGACACT 

TRP-1 #11 

TGKYDPAVRSLHNLAHLFLNGTGGQTHLSS 
ACCGGAAAGTATGACCCTGCCGTCAGGTCCCTCCATAACCTCGCCCATCTCTTTCT 

MUC1R #14 

SDVSVSDVPFPFSAQSGAGVPGWGIALLVL 
AGCGATGTGTCCGTGTCCGACGTCCCCTTTCCCTTTAGCGCTCAGTCCGGCGCTGGC 

TRP2 #10 

SPQEREQFLGALDLAKKRVHPDYVI TTQHW 
AGCCCT(^GGAAAGGGAACAGTTTCTGGGAGCCCTCGACCTCX3CCAAAAAGAGAGTGCATCCCGATTACGTC^ 

Tyros #10 

FFAYLTLA KHTI SSDYVI PIGTYGQMKNGS 
TTCTTTGCCTATCTGACACTGGCTAAGCATACCATTAGCTCCGACTATGTGATTCCCA 

MC1R #7 ^ 

GTN V L E 'TA V,I LL L EAGA LVA RAA V L. QQ LD N 
GGCACAAACGTCCTGGAAACCGCTGTGATTCTGCTCCTGGAAGCCGGAGCCCTCGTGGCTAGGGCTG 

MUC1R #16 

VCVLVALAIVYLI ALAVCQCRRKNYGQLD I 
GTGTGTGTGCTCGTGGCTCTGGCTATCGTCTACCTCATCGCTCTGG 

MART #6 

CPQEGFDHRDSKVSLQEKNCEPVVPNAPPA 
TGCCCTCAGGAAGGCTTTGACCATAGGGATAGCAAAGTCTCCCTCCAAGAGAAAAACT^ 

MUC1F #5 

SVLS SHS PGSGS STTQGQDV TLA PAT E PA S 
AGCGTCCTGTCCAGCCATAGCCCTGGCTCCGGCTCCAGCACAACCCAAGG 

TRP2 #28 

DEWMKRFNPPADAWPQELAPIGHNRMYNMV 
GACGAATGGATGAAGAGATTCAATCCCCCTGCCGATCCCTGGCCCCAAGAGCT 

MC1R #21 

AFHSQELRRTLKEVLTCSWAA 
GCCTTTCACTCCCAGGAACTGAGAAGGACACTGAAAGAGGTCCTGACATGCTCCTGGGCTGC 

TRP2 #15 

FSHQGPAFVTWHRYHLLCLERDLQRLIGNE 
TTCTCCCACCAAGGCCCTCCCTTTCTGACATGGCATAGGTATCACCrrCCrGTGTCTGGAAAGGG 

TRP-1 #8 

RPMVQRLPEPQDVAQCLEVGLFDTP PFYSN 
AGGCCTATG<3TCCAGAGACTGCCTGAGCCTCAGGATGTGGCTCAGTGTCTGGAAGTGGGACTGTTTCACAC 

TRP-1 #13 

QDP I FVLLHTFTDAVFDEWLRRYNADI STF 
CAGGATCCCATTTTCCTCCTGCTCCACACATTCACAG 

TRP2 #4 

LGAESANVCGSQQGRGQCTEVRADTRPWSG 
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CTGGG AG CCGAAAGCG CTAACGTCTGCGG AAGCCAACAGGG AAGGGG AC AGTG TACCG AAGTGAG AGCCGAT ACCAGACCCTGGAGCGGA 
TRP2 #8 

YHCGDCKFGWTGPNCERKKPPVI R Q N I HS L 
TACAATTGCGGAGACTGTAAGTTTGGCTGGACCtKSACCCAATTG 

TRP-1 #12 

HLFL»NGTGGQTHLSSQDPI FVLLHTFTDAV 
CACCTCTTCCTCAACGGAACCGGAGGCCAAACCCATCTGTCCAGCCAAGACCCTATCTTTGTGCTCCTGCATACC^ 

Tyros #34 

GLVSLLCRHKRKQLPEEKQPLLMEKEDYHS 
GGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCCCCGAAGAGAAACAGCCTCTGCT 

TRP2 #2 

GCKILPGAQGQFPRVCMTVDSLVNKECCPR 
GGCTGTAAGATTCTGCCTGGCGCTCAGGGACAGTTTCCCAGAGTGTGTATGACAGTGGATAGCCTCGTGAATA^ 

gplOO #43 

QLPHSSSHWLRLPRIFCSCPIGENS PLLSG 
CAGCTCCCCCATAGCTTCCAGCCATTGGCTCAGGCTCCCCAGAATCTTTTGCTCCTGCCCTATCGGAGAGAATAGCCCT 

gplOO #10 

DGGPCPSGSWSQKRSFVYVWKTWGQYWQVL 
GACGGAGGCCCTTGCCCTAGCGGAAGCTGGAGCCAAAAGAGAAGCTTTGTGTATC 

gplOO #3 

NQDWLGVSRQLRTKAWNRQLYPEWTEAQR L 
AACCAAGACTGGCTGGGAGTGTCCAGGCAACTGAGAACCAAAGCCTGGAACAGACA 

Tyros #14 

IWRDIDFAHEAPAFLPWHRLFLLRWEQEI Q 
ATCTGGAGGGATATCGATTTCGCTCACGAAGCCCCTGCCTTTCTCCCTTGGC 

MUC1F #1 

AAMTPGTQS PFFLLLLLTVLTVVTG SG HAS 
GCCGCTATGACACCCGGAACCCAAAGCCCITTCTTTCTGCTCCTGCT 

MART #5 

DKSLHVGTQCALTRRCPQEGFDHRDSKVS L 
GACAAAAGCCTCCACGTCGGCACACAGTGTGCCCTTCACCAGAAGGTGTCCCCAAGAGGGATTCGATCACAGAGACTCCAAGGTCAGCCTC 

MUC1R #2 

nvtsasgsasgsastlvhngtsaratttpa 
aacgtcacctccgcctccggctccgcctccggctccx;cctcc^ccctcgt^ 

Tyros #24 

LEGFAS P LTGI ADASQSSMHNALH I YMNGT 
CTGGAAGGCTTTGCCTCCCCCCTCACCGGAATCGCTGACGCTAGCCAAAGCTCCATGCAT 

TRP2 #14 

RDTLLGPGRPYRAIDFSHQGPAFVTWHRYH 
AGGGATACCCTCCTGGGACCCGGAAGGCCITACAGAGCCATTGACTTTAGCCATCAGGGACCCGCTTTC^ 

Tyros #1 

AAMLLAVLYCLLWSFQTSAGHFPRACVSSK 
GCCGCTATGCTCCTGGCTGTGCTCTACTGTCTGCTCTGGTCCTTCCAAACCTC 

gplOO #35 

AFELTVSCQGGLPKEACMEISSPGCQPPAQ 
GCCTTTGAGCTCACCGTCAGCTGTCAGGGAGGCCTCCCCAAAGAGGCTTGCATGGAGATTAGCTCCCCCGGATGCC 

Tyros 86 

VDDRESW PSVFYNRTCQCSGNFMGFNCGNC 
GTGGATGACAGAGAGTCCTGGCCTAGCGTCTTCTATAACAGAACCTGTCAGTGTAGCGGAAACTTT 

gplOO #34 

ESAEI LQAVPSGEGDAFELTVSCQGGLPKE 
G AGTCCGC CGAAATCCTCCAGG CTGTGCCTAGCGG AGAGGG AGACGCTTTCGAACTG AC AGTGTCCTGCCAAGG CGG ACTGCCTAAGG AA 

TRP2 #20 

TVCDS LDDYNHLVTLCNGTYEGLLRRNQMG 
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ACCGTCTGCGATAGCCTCGACGATTACAATCACCTCX3TGACACTGTGTAACGGAACCTATGAGGGACTC 
Tyros #5 

LLSNAPLG PQFPFTGVDDRESW PSVFYNRT 
CTGCTCAGCAATGCCCCTCTGGGACCCCAATTCCCTTTCACA 

MART #8 

YEKLSAEQSPPPYSPAA 
TACGAAAAGCTCAGCGCTGAGCAAAGCCCTCCCCCTTACTCCCCCGCTGCC 

gplOO #41 

IVGILLVLMAVVLASLIYRRRLMKQDFSVP 
ATCGTCGGCATTCTGCrCGTGCTCATGGCTGTGGTCCTGGCTAGCCTCATCTATAGGAGAAGGCTCATC 

MART 83 

GIGILTVI LGVLLLIGCWYCRRRNGYRALM 
GGCATTGGCATTCTCACAGTGATTCTGGGAGTGCTCCTGCTCATCGGATGCTGGTACTGTAGGAGAAGGAATGGCTATAGGGC 

Tyros #31 

YSYLQDSD PDSFQDYI KSYLEQASRIWSWL 
TACTCCTACCTCCAGGATAGCGATCCCGATAGCTTTCAGGATTACATTAAGTCCTACCTCGAGCAAGCCT 

MUC1F #6 

QGQDVTLAPATEPASGSAATWGQDVTSVPV 
CAGGGACAGGATGTGACACTGGCTCCCGCTACCGAACCCGCTAGCGGAAGCGCTGCCACATGGGGACAGGATGTGACAAGCGTCCCCGTC 

gplOO 821 

TSCGSS PVPGTTDGHRPTAEAP NTTAGQVP 
ACCTCCTGCGGAAGCTCCCCCGTCCCCGGAACCACAGACGGACACAGACCCACAGCCGAAGCCCCTAACACAACCGCTGGCCAAGTGC 

WUC1R 83 

LV HNGTS ARATTTPAS KS TP FS I PSHHSDT 
CTGGTCCACAATGGCACAAGCGCTAGGGCTACCACAACCCCTGCCTCCAAGTCCACCCCTTTCTCCATCCCTAGCCATCACTCCGACACA 

TRP2 #32 

EETPGW PTT|LLVVMGTLVAL.VGLFVLLAFL 
GAGGAAACCCCTGGCTGGCCCACAACCCTCCTGGTCGTGATGGGCACACTC 

gplOO 829 

TTTEWVETTARELPI PEPEGPDASSIMSTE 
ACCACAACCGAATGGGTCGAGACAACCGCTAGGGAACTGCCTATCCCTGAGCCTGAGGGACCCX3ATGCCT 

MC1R #17 

GAVTLTI LLGIFFLCWGPFFLHLTLIVLCP 
GGCGCTGTGACACTGACAATCCTCCTGGGAATCTTTTTCCTCTGCTGGGGCCCT^ 



MC1R #B 

GALVARAAVLQQLDNVI DVI TCS SMLSS LC 
GGCGCTCTGGTCGCCAGAGCCGCTGTGCTCCAGCAACTGGATAACGTCATCGATGTG 

gplOO #26 

MTPEKVPVSEVMGTTLAEMSTPEATGMT PA 
ATGACACCCGAAAAGGTCCGCGTCAGCGAAGTGATGGGCACAACCCTCGCCGAAATGTCCACCCCTGAGGCTACGGGAATGACACGCGCT 

Tyros #2 

QTSAGHFPRACVSSKNLMEKECCPPWSGDR 
CAGACAAGCGCTGGCCATTTCCCTAGGGCTTGCGTCAGCTCCAAGAATCTGATGGAGAAAGAGTGTTGCCCTCCCTC 

MCI R 8 1 1 

ALRYHS I VTL PRAPRAVAAI WVASVVFSTL 
GCCCTCAGGTATGACTCCATCGTCACCCTCCCCAGAGCCCCTAGGGCTGTGGCTGCCATTTGGGTCGCCTCCGTGGTCTTCTCCACCCTC 

MUC1R #12 

FREGTINVHDVETQFNQYKTEAASRYNLTI 
TTCAGAGAGGGAACCATTAACGTCCACGATGTGGAAACCCAATTCAATCAGTATAAGACAGAGGCTGCCTCO^GGTATAACCT 

Tyros 83 

NLMEKECCPPWSGDRSPCGQLSGRGSCQNI 
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AACCTCATGGAAAAGGAATGCTGTCCCCCTTGGTCCGGCGATAGGTCCCCCTGTGGCCAACTGTCCGGCAGAGGCTCCTGCCAAAACATT^ 
Tyros 832 

IKSYLEQAS R I WSWLLGAAMVGAVLTALLA 
ATCAAAAGCTATCTGGAACAGGCTAGCAGAATCTGGAGCTGGCTXSCTC 

MUC1R #5 

PTTLASHSTKTDAS STHHSSVPPLTSSNHS 
CCCACAACCCTCGCCTCCCACTCCACCAAAACCGATGCCTCCAGCACACACCATAGCTCCGTGCCT 

MUC1R #15 

SGAGVPGWGIALLVLVCVLVALAIVYLIAL 
AGCGGAGCCGGAGTGCCTGGCTGGGGCATTGCCCTCCTGGTCCTCGTCTGCGTCCTGGTCG 

MC1R #10 

FLGAIAVDRYISIFYALRYHSIVTLPRAPR 
TTCCTCGGCGCTATCGCTGTGGATAGGTATATCTCCATCTTTTACGCTCTGAGATACCATAGCATT^ 

gplOO #40 

LIMPGQEAGLGQVPLIVGILLVLMAVVLAS 
CTGATTATGCCTGGCCAAGAGGCTGGCCTCGGCCAAGTGCCTCTGATTGTGGGAAT 

TRP2 #33 

TLVALVGLFVLLAFLQYRRLRKGYTPLMET 



TRP-1 #S 

LISPNSVFSQWRVVCDSLEDYDTLGTLCNS 
CTGATTAGCCCTAACTCCGTGTTTAGCCAATGGAGAGTGGTCT 

MC1R #2 

LNSTPTAI PQLGLAANQTGARCLEVSISDG 
CTGAATAGCACACCCACAGCCATTCCCCAACTGGGACTGGCTGCCAATCAGACAGGCGCTAGGTGTCTGGAAGTGTCCATCTCCGACGGA 

Tyros #28 

HRPLQEVY PEANAP I GHNRESYM VPFI PLY 
CACAGACCCCTCCAGGAAGTGTATCCCGAAGCCAATGCCCCTATCGGACACAATAGGGAAAGCTATATGGTCCCCTTTATCCCT 

gplOO #24 

EPSGTTSVQVPTTEVI STAP VQMPTAESTG 
GAGCCTAGCGGAACCACAAGCGTCCAGGTCCCCACAACCGAAGTGATTAGCACAGCCCCTGTGCAAATGCCTACCGCTGAGTCCACCGGA 

TRP2 #11 

KKRVHPDYVI TTQHWLGLLGPNGTQPQFAN 
AAGAAAAGGGTCCACCCTGACTATGTGATTACCACACAGCATTGGCTCGGCCTCCTGGGACCCAATGGCACACAGCCTC^GTTTGCCAAT 

gplOO #38 

LHQ I LKGGSGTYCLNVS LADTNS LAVVS TQ 
CTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTATTGCCTCAACGTCAGCCTCGCCGATACCAATAGCCT 

gplOO #3 0 

PEPEG PDASS I MSTE S I TGSLGPLLDGTAT 
CCCGAACCCGAAGGCCCTGACGCTAGCTCCATCATGAGCACAGAGTCCATCACAGGCTCCCTGGGACCCCT 

gplOO #31 

SITGSLG PLLDGTAT LR LVKRQVPLDCVLY 
AGCATTACCGGAAGCCTCGGCCCTCTGCTCGACGGAACCGCTACCCTCAGGCTCGTGAAAAGGCAAGTGCCTCTGGATTG 

gplOO #5 

DCWRGGQVSLKVSNDGPTLI GANASFS I A L 
GACTGTTGGAGAGGCGGACAGGTCAGCCTCAAGGTCAGCAATGACGGACCCACACTGATTGGCGCTAACGCTAG 

Synthetic Protein: 



WNRQLYPEWTEAQRUX^RGGQVSLKVSITOPYIUINQDDREL^ 

HRYHLl^LEKJDMQEMLQEPSFSGHNRESYMVPFI PLYRNGDFFI SSKBLGYDLI^LERDLQRLIGNESFALPYT^FATGR>TETTEVVGTTPGQAPTAE 

PSGTTSVQVPTTEVSTDYTQELQRJDISEMFIiQIYKQGGFLGLSN^ 

l^WMGTEIX3PIRRNPAGNVARPOTQRLPEPQDVACOrr^ 

MS PLWWG FLLSCLGCKI LPGAQGQFPRVADLS YTWDFGDSSGTLI SRALWTHTYLE PLAEMSTPEATGMTPAEVSI WLSGTTAAQVIKFRPGSVW 

QLTIAFREGTINVHDVETQFGSAATWGQDVTSVPVTRPALGSTTPPAHDVI^ 

SQEIJTCTLKEVUCFFHRTCKCrGNFAGYNCGDCKFGVmSPNCLSL^ 

DASSAANRPAIjGSTAPPVHNVTSASGSASGSASTCNGTYEGLLRRNQMGRNSMKLPTLKDIRDCT 
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YDHVAVLLCLWFFLAMLVLMAVLYVKLTGDENFTI PYWDWRDAEKCDI CTDEYMGLRLVKRQVPLDCVLYRYGSFSVTLDIVQGIFLQI YKQGGFLG 
LSNIKFRPGSVWQLTIJWIDVITCSSMLSSIXFLGAIAVDRYISIFYRNPGNH^^ 

P IGHNRQYNMV S LADTNS LA WSTQLI M PGQ EAG LG Q V PLG P VTAQ WLQAAI PLTSCGSSPVPGTTDGHKFGFWGPNCTERRLLVRRNI FDLSAPEK 
DKLGTttTMEVTVYHRRGSRSYVPLAHSSSAFTAVAAIWVASVVFSTLFIAYYD 

RRNGYRALMDKS LHVGTQCALTRR PWHRLFLLRWEQE I QKLTGDENFT I PYWDWAAMA VQGSQRRLLCS LNSTPTAIPQLGLAAWATIAKNRNLHGP 
MYCFI CCLALSDLLVSQS SMHNALH I YMNGTMSQVQGSANDP I FLLGQH PTNPNLLS PAS FFSSWQI VCSRLEE YNYCF I CCLALSDLLVSGTNVLET 
AVI LLLEADPTLI SRNSRFSSWETVCDSLDD YNHLVTLTR PALGSTTP PAHDVTS APDN KAARDAEKCD I CTDEYMGGQH PTNPNLLS PAS FT FALQL 
HDPSGYLAEADLS YTWDFGDSSGTS S ADVE FCLSLTQYESGSMDKAAN FS FRNTG PTLI GANAS FS I ALNF PGSQKVLPDGQVIWGPFFLHLTLIVLC 
PEHPTCGCIFKNFNLFCQCSGNFMGFNCGNCKFGFWGPNCTERRLLQYR 

TEMFVTNFPGSQKVLPDGQVIWVNNTIINGSQVWGGRPTAEAPNTTAGQVPTTEWGTTPGQAPTASTPG^ 
YRJWLMKQDFSVPQLPHSSSHWLRLPRIIX3L1X3PNGTQPQFANCSVYDFFWLHYYSV^ 

LLHLAVI GALLAVGATKVPRNQTGARCLEVS I SDGLFLSLGLVSLVENAI^GSMDKAANFSFRNTLEGFASPLTGIADASSPCGQLSGRGSCQNILLS 
NAPLGPQFPFTGMHYYVSMDALLGGSEI WRDI DFAHEAPAFLEEKQPLLMEKEDYHS LYQSHLAAGCGTEVRADTRPWSGPYILRNQDDRELWPRSVP 
SSTEKNAVSMTS S VLS SHSPGSGSSTTTPMFNDI N I YDLFVWMHYYVSMDALLGG S EQPVY PQETDDACI F PDGG PCPSGSWSQKRSDSLEDYDTLGT 
LCNSTEDGPIRRNPAGNVAWVNNTIINGSQVWGGQPVYPQETDDACIFPQEK^ 
VPFSVSVSQLRLEIOMQEMl^EPSFSLPYWNFATGKNV 

VRRN I FDLS A PEKDKF FA YLTLAKHT I S S D KKGHGH S YTTAE EAAG I G I LTV I LG VLLL I FVYVW KTWGO, YWQ VLGG PV SGLS 1 GTGRAMGG P VSGLS 
IGTGRAJ^THTMEVTVYHRRG I STA PVQMPTAESTGMTPEKVPVSEVMGTTFSS WQI VCSRLEEYNSHQSLCNGTPEGPLRDPIFWLHSFTDAIFD 
EWMKRFNPPADAWPHMLARACQHACGIAR1J1KRQRPVHCGFGLKLLTVLTVVTGSG 
RNALEGFDKADGTLDSQVMSLHNLVHSHQSLCNGTPE 

HSLSPQEREQFLGALDIAQEIAPIGHNRMYNMVPFFPPVTNEELFLTSEVSIWLSGTTAAQVT^ 

LQFNSSLEDPYHTHGRYVPPSSTDRSPYEKVSAGNGGSSLLFLSLGLVSLVENALWATIAKNRNLHSPMSFI^GTNALPHSAAl^PIFW 

I FA VCQCRRKNYGQLD I F P ARDTYH PMS E Y PTV FFLAML V1MA VLYVHMLARACOHAGG I ARS TNS FRNTVEG Y S DPTG K YDP AVRS LHNLALP YWN F 

ATGKNVCDICTDDLMGSRSNFDSTITDQVPFSVSVSQLRALDGGNI^FLJWQPL 

LS YTNPAVAAASANLA YVI PI GTYGQMKNGST PMFNDI N I YDLFVWRLCQPVLPS PACQLVLHQI LKGGSGTYCLNRYGS FS VTLDIVQGIESAEILQ 
AVPSGEGDHHAFVDSIFEQWLQRHRPIXJEVYPEANAPICTDDLMGSRSNFDSTLISPN^ 

S YTN P A V AAAS ANLAAEH PTCGC I FKNFNLFLALI ICNAI IDPLI YMSQVQGS ANDPI FLLHHAFVDSI FEQWLQRRNSMKLPTLKDI RDCLSLQKFD 
N P P F FQNS LIS RAL WTHT YLE PG P VT AQ WLQ AA I P LS F ALP YWN FATGRNECD VCTDQLFG AAR PDV I G ALLA VG ATKV P RNQDWLG VS RQLR TKA 
ALJX5GNKHFl>RNOPLTFALOLHDPSGYIJVECDVCTDQLFGAARPDDPTLISRNSRFSSWEAAMP 
RSL«NLAHLFI^GTGGCTHI^SSDVSVSDVPFPFSAQSGAGVPGWGIALLV1^PQER 

VI PIGTYGQMKNGSGTNVLETAVI LLLEAGALVARAAVLQQLDNVCVLVALAI VYLIALAVCQCRRKNYGQLDICPQEGFDHRDSKVSLQEKNCEPW 
PNAP PAS VLSSHS PGSGSSTTQGQDVTLA PATE PASDEWMKRFNPPADAWPQELAPI GHNRMYNMVAFHSQELRRTLKEVLTCSWAAFSHQGPAFVTW 
HRYHLLCLERDLORLI GNERPMVQRLPEPQDVAQCLEVGLFDTPPFYSNQDPI FVLLHTFTDAVFDEWLRR YNAD I STFLGAESANVCGSQQGRGQCT 
EVRADTRPWSG YNCGDCKFGWTGPNCERKKPPVI RQNI HSLHLFLNGTGGQTHLSSQDPI FVLLHTFTDAVGLVSLLCRHKRKQLPEEKQPLLMEKED 
YHSGCKILPGACGQFPRVC>!TVDSLVNKECCPRQLPHSSSHWLRLPRIFCSCPIGENSPL 

G VSRQLRTKAWNRQLYPEWTEAQRLI WRDI DFAHEA PA FLPWHRLFLLRWEQE I QAAMTPGTQS PFFLLLLLTVLT WTGSGHASDKS LHVG TQC ALT 
RRC PQEGFDHRDS KVS LNVTS ASGSASGS ASTLVHNGTS ARATTTPALEGFAS PLTGI ADASQS SMHNALH I YMNGTRDTLLGPGRPYRAIDFSHCGP 
AFVTWHR YHAAMLLAVLYCLLWS FQTSAGHFPRACVS SKAFELTVSCQGGLPKEACWEI SS PGCQP PAQVDDRESWPSVFYNRTCQCSGNFMGFNCGN 
CESAE I LOAVPSGEGDAFELTVSCQGGLPKETVCDSLDDYNHLVTLCNGTYEGLLRRNQMGLLSNA PLG PQF PFTGVDDRESWPSVFYNRTYEKLSAE 
QS PP PYS PAAI VG I LLVLMAWLASLI YRRRLMKQDFS VPG I G I LTVI LGVLLLI GCWYCRRRNG YRALMYS YLQDSDFDSFQDYIKSYLEQASRIWS 
WLQGQD VT LA PAT E P ASG S AATWGQD VTS V P VT SCG S S PV PGTTDGH R PT A E A PNTTAGQ V PLVHNGTS ARATTT P AS KST P FS I P S HH SDT E ET PG W 
PTTLLWMGTLVALVGLFVLLAFLTTTEWVETTARELPI PEPEGPDASS1MSTEGAVTLTILLGI FFLCWGPFFLHLTLIVLCPLGAAMVGAVLTALL 
AGLVSLLCRHKRKQLPGALVARAAVLQQLDNV I DVI TCSSMLS S LCMTPE KVPVS EVMGTTLAEMSTPEATGMTPAQTSAGHFPRACVSSKNLMEKEC 
CPPWSGDRALR YHSI VTLPRAPRAVAAI WVASWFSTLFREGTI NVHDVETQFTtfQYKTEA^ 

I KS YLEQAS R I WS WLLGAAMVGAVLTALLAPTTLASHSTKTDAS STHHSS VP PLTSSNHS SGAGV PGWG I ALLVLVCVLVALAI VYLI ALFLGAIAVD 
RY I S I FY ALRYHS I VTLPRAPRLI MPGQEAG LGQV PLI VG I LLVLMA\AnoASTLVALVGLFVLLAFLQYRRLRKGYTPLMETLISPNSVFSQWRVVCD 
SLEDYDTLGTLCNSLNSTPTAIPQLGLAANOTGARCLEVSISDGHRPLQEVYPEANAPIGHNRESYMVPFIPLYEPSGTTSV 

TAESTGKKR VH PD YV I TTQHWLGLLG PNGTQPQFANLHC; I LKGG SGTYCLNV S LADTNS LA WSTQ PE P EG PDAS SIMSTESI TGS LG PLLDGT ATS I 
TGSLG PLLDGTATLRLVKROVPLDCVLYDCWRGGQVSLKVSNDG PTLI GANAS FS I AL 

Synthetic DNA : 



TGGAATAGGCAACTGTATCCCGAATGGACAGAGGCTCAGAGACTGGATTGCTGGAGGGGAGGCCAAGTGTCGCTGAAAGTGT 

CCTCAGGAATCAGGATGACAGAGAGCTCTGGCCTAGGAAATTCTTTCACAGAACCTGTAAGTGTACCX5GAAACT^ 

TTATCTGCAGCAAAGACCTCGGCTATGACTATAGCTATCTGCAAGACTCCGACCCTGACTCCTTCCAAGACTATG 

CACAGATACCATCTGCTCAGGCTCGAGAAAGACATGCAGGAAATGCTCGAGGAACCCTCCTTCTCGGGCCATAACAGAGAGTCCT 

CATTCCCCTCTACAGAAACGGAGACTTTTTCATTAGCTCCAAGGATCTGGGA 

ATGAGTCCTTCGCTCTGCCTTACTGGAACTTTGCCACAGGCAGAAACGAAACCACAGAGGT 

CCCTCCGGCACAACCTCCGTGCAAGTGCCTACCACAGAGGTCAGCACAGACTATTACCAAGAGCTCCAGAGAGACATTAGGGAAATGTTT 

CTATAAGCAAGGCGGATTCCTCGGCCTCAGCAATGCCTGTATGGAAATCTCCAGCCCTGGCTGTCAGGCTCG 

TCCCCTCCCCCGCTTGCCAACIXKn'CGACCAACTGGGATACrCCTACGCT 

CTGCTCG TGGTCATGGGAACCGAAGACGGAC C C ATTAGG AGAAACCCTGCCGG AAACGTCG CC AGACCG ATGGTGCAAAGGCTCCCCG AACCCCAAGA 
GGTCGGCCAATGCATGACCGTCGACTCCCTGGTGAACAAAGAGTGTTGCCCrAGGCrcGGCGCrGAGTCCGCCAATGTGTGTG 
GAAACCAATACAAAACCGAAGCCGCTAGC^GATACAATCTGACAATCTCCGACGTCAGCGTCA 
ATGTCCCCCCTCTGGTGGGGCTTTCTCCTCAGCTGTCTGGGATGCAAAATCC^ 

CACATGGGATTTCGGAGACTCCAGCGGAAGCCTCATCTCCAGGGCTCTGGTCGTGACACACAGATACCTCGAGCCTCTGGGTGAGATGAGCACACXr 

AAGCCACAGGCATGACCCCTGCCGAAGTGTCCATCGTCGTGGTCAGCGGAACCACAGCCGCTCAGGTGATCAAATTCAGAGCCGGAAGGGTCGTGGTC 

CAGCTCACCCTCGCCrTTTAGGGAAGGCACAATCAATGTGCATGACGTCGAGACACAGTTTGGCTCCGCCGCrACCIW 

GCCTGTGACAAGGCCTGCCCrCGGCrCCACCACACGCCCTGCCCATGACGTCCTGGATAAGAGACAGAGACCCGTCGACCAAGGCTT^ 

GAGCCGTCACCCTCACCATTCTGCTCGGCATTTTCrTTCTC 

AG CC AAG AGCT C AGGAGAACCCTCAAGG AAGTGCTCAAGTTTTTCC AT AGGAC ATG C AAATGCACAGGCAATTTCGCTGGCTAT AACTGTGGGG ATTG 
CAAATTCGGATGGACAGGCCCTAACTGTCrGTGCCTGCAAAAGTTTGACAATCCCCCrTTTG^ 
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AAGGCTTTGACAAAGCCGATAGCAAAAGCACACCCTTTAGCATTCCCT 
GACGCTAGCTCCGCCGCTAACAGACCCGCTCTGGGAAGCACAGCCCCTCCC^ 

CACATGCAATGGCACATACGAAGGCCTCCTGAGAAGGAATCAGATGGGCAGAAACTCCATGAAACTGCCTA 

ATCACTCCAGCGTCCCCCCTCTGACAAGCTCCAACCATAGCACAAGCCCTCAGCTCAGGACAGGCGTCAGCTTTT^ 

TACGATCACGTCGCCGTCCTGCTCTGCCTCGTGGTCTTCTTTCTGGCTATGCTCGTGCT 

CT1TACCATTCCCTATTGGGATTGGAGAGACGCTGAGAAATGCGATATCTGTACCGATGAGTATATGGGACTGAGA 

TCGACTGTGTGCTCTACAGATACGGAAGCTTTAGCGTCACCCTCGACATTGTGCAAGGCATTTTCCTC 

CTGTCCAACATTAAGTTTAGGCCTGGCTCCGTGGTCGTGCAACTGACACTGGC^ 

CTTTCTGGGAGCCATTGCCGTCGACAGATACATTAGCATTTTCTATAGGAATCCCGGAA^ 

CTGACGTCGAGTTTTGCCTCAGCCTCACCCAATAOIAATTCGATGAGTGGC^ 

CCCATTGGCCATAACAGACAGTATAACATGGTGTCCCTGGCTGACACAAACTCCCTGGCT 

CGGACTGGGACAGGTCCCCCTCGGCCCTGTGACAGCCCAAGTGGTCCTGC 

CAACCGATGGCCATAAGTTTGCCTTTTGGGGACCCAATTGCACAGAGAGAAGGCTCCTG^ 

GACAAACTGGGAACCCATACCATGGAGGTCACOSTCTACCATAGGAGAGGCTCC^ 

CGTCGCCGCTATCTGGGTGGCTAGCGTCGTGTTTAGCACAC^ 

ATAGCCAAGTGATGAGCCTCCACAATCTGGTCCACTCCTTCCTCAACGGAACCAATGCCCTCCCCCATAGCX3CTGCCAA 

AGGAGAAACGGATACAGAGCCCTCATGGATAAGTCCCTGCATGTGGGAACCCAATGC 

GTGGGAGCAAGAGATTCAGAAACTGACAGGCGATGAGAATTTCACAATCCCTTACTGGGACTG 

TCCTGGGAAGCCTCAACTCCACCCCTACCGCTATCCCTCAGCTCGGCCTCGCCGCTGTGGTCGCCACAATCGCTA^ 

ATGTATTGCTTTATCTGTTGCCTCGCCCTCAGaSATCTGCTCGTGTCCCAGTCCAGCATC 

CCAAGTGCAAGGCTCCGCCAATGACCCTATCTTTCTGCT^ 

AAATCGTCTGCTCCAGGGTCX3AGGAATACAATTACTGTTTCATTTGCTGTCTGGCTCTGTCCGA 

GCCGTCATCCTCCTGCTCGAGGCTGACCCTACCCTCATCTCCAGGAATAGCAGATTCTrCCAGCTGG 

CCATCTGGTCACCCTCACCAGACCCGCTCTGCGAAGCACAACCCCTCCCGCTCACGATGTGACAAGC^ 

AAAAGTGTGACATTTGCACAGACGAATACATGGGCGGACAGCATCCCACAAACCCTAACCTCCTGTCCCCCGCTA 

CACGATCCCTCCGGCTATCTGGCTGAGGCrrcACCTCAGCTATACCrc 

GTCCCTGACACAGTATGAGTCCGGCTCCATGGATAAGGCTGCCAATTTCTCCTTCAGAAACACAGGCCCT 
CCATCGCTCTGAATTTCCCTGGCTCCCAGAAAGTGCTCCCCGATGGCCAAGTGATTTGGGGACCCTITTTCCTC 

CCCGAACACCCTACCTGTGGCTGTATCTTTAAGAATTTCAATCTG'i'l'l"! GCCAATGCTCCGGCAATTTCATGGGCTTTAACTGTGGCAATTGCAAArT 

AAAGGTATACCX5AAGAGGCTGCCGCTCCCCTCGAGAATGCCCCTATCGGACACAATAGGCAATACAAT 
ACCGAAATGTTTGTGACAAACTTTCCCGGAAGCCAAAAGCTCCTGCCTGACGGACAG 

GTGGGGCGGAAGGCCTACCGCTGAGGCTCCCAATACCACAGCCGGACAGGTCCCCACAACCGAAGTGGTCGGCACAACCCCTGGC 

CTAGCACACCCGGAGGCGAAAAGGAAACCTCCGCCACACAGAGAAGCTCCGTGCCTAGCTCCACCGAAAAGAATGCCGTCAGCATXa 

TACAGAAGGAGACTGATGAAGCAAGACTTTAGCGTCCCCCAACTGCCTCACTCCAGCTCCCACTGGCTGAG 

CCCTAACGGAACCCAACCCCAATTCGCTAACTGTAGCGTCTACGATTTCTTTGTGTGGCTGCATTACTATAGCGTCTGCCTCGA 

ATACCCCTCCCITTTACTCCMCrCCACCAATAGCTTTAGGAATACCGTCGAGGGATACTCCGACCCTGCCGCT 

CTGCTCCACCTCGCCGTCATCGGAGCCCTCCTGGCTGTGGGAGCCACAAAGGTCCCCAGAAACCAAACCG^ 



TCGAGGCATTCGCTAGCCCTCTGACAGGCATTGCCGATGCCTCCAGCCCTTGCGGACAGCTCAGCGGAAGGGG/^ 
AACGCTCCCCTCGGCCCTCAGTTTCCCTTTACCGGAATGCATTACTATGTGTCCATGGA 

CTTTGCCCATGAGGCTCCCGCTTTCCTCGAGGAAAAGCy^CCCCrCCrcATGGAGAAAGAGGATTACCATAGCCT 
GCCAATGCACAGAGGTCAGGGCTGACACAAGGCCTTGGTCCGGCCCTTACATTCTGAGAAA 

TCC AGCACAG AG AAAAACGCTGTGTC CATG ACAAGCTCCGTGCTCAGCTCCCACT CCCCCGG AAG CGGAAGCTCCACCACAACCCCTATGTTTAACG A 
TATCAATATCTATCACCTCTTCGTCTGGATGCACTATTACGTCAGCATGGACGCT 

CTCTGCAATAGCACAGAGGATGGCCCTATCAGAAGGAATCCCGCTGGCAATGTGGCTTGGGTCAACAA 
AGGCCAACCCGTCTACCCTCAGGAAACCGATGACGCTTGCAT^ 

AGAAACTGTCCGCCGAACAGTCCCCCCCTCCCTATAGCCCTAGCAGAAGCTATGTGCCTCTGGCTCACTCCAGCTCCGCCTTTACCA 
GTCCCCTTTAGCGTCAGCGTCAGCCAACTGAGACTGGAAAAGGATATGCAAGAGATGCTGCAAGAGCCT 
TACCGGAAAGAATGTGTCTGACATTGTGCCTTTCTGGCCCCCTGTGACAAACACAGAGATGTTCGTC 
AGGCTGCCTGCTCCGTGTATGACTTTTTCGTCr 

GTGAGAAGGAATATCTTTGACCTCAGCGCTCCCGAAAAGGATAAGTTTTTCGCTTACCTCACCCT 
AAACCTGGGGCCAATACTGGCAGGTCCTGGGAGGCCCTGTGTCCG^ 

ATCGGAACCGGAAGGGCTATGCTCGGCACACACACAATGGAAGTGACAGTGTATCACAGAAGGGGAATCT 

CG AAAGCACAGGCATG ACCC CTG AG AAAGTGCCTGTGTCCG AGGTC ATGGG AACCACATTCTCCAG CTGG CAGATTGTGTGT AGCAGACTGGAAG AGT 

ATAACTCCCACCAAAGCCTCTGCAATGCCACACCCGAAGGCCCTCTGAGAGACCCTATCTTTGTGGT 

GAGTGGATGAAAAGGTTTAACCCTCCCGCTGACGCTTGGCCTCACATGCTGGCTAGGGCTTGC 

GCAAAGGCCTGTGCATCAGGGATTCGGACTGAAACTGCTCACCGTCCTGACAGTGGTCACCX3GAAGCGGACACGC 

AAGAGACAAGCGCTACCCAAAGGTCCTTCTGTAGCTGTCCCATTGGCGAAAACTCCCCCCTCCTGTCCGGCCAA 

AGGAATGCCCTCX3AGGGATTCGATAAGGCTGACGGAACCCTCGACTCCCAGGTCATGTCCCT 

CGG^CCCCTGAGGGACCCCTCAGGAGAAACCCTGGCAATCACGATAAGTCCAGGACACCCAGACTGCCTCCCTTTT^ 

AACTGTTTCTGACAAGCGATCAGCTCGGCTATAGCTATGCCATTGACCTCCCCGTCAGCGTCGAGAGAAAGAAACCCCCTGTGATTAGGCAA 

CACTCCCTGTCCCCCCAAGAGAGAGAGCAATTCCTCGGCGCTCTGGATCTGGCTCAGGAACTGGCrCCCATTGGCCATAAC 

GCCTTTCTTTCCCCCTGTGACAAACGAAGAGCTCTTCCTCACCTCCGAGGTCAGCATTGTGGT 

CAGAGTGGGTGGAAACCACAGCCAGAGAGCTCCCCATTACCTCCCCCCAACTGTCCACCGGAGTGTCCi T CT ITTI CCTCAGCTTTCACATTAGCAAT 
CTGCAATTCAATAGCTCCCTGGAAGACCCTTACCATACCCATGGCAGATAOGTCCCCCCTAGCTCCACCGATAGGTCCCCCTATC 

TCCACTCCCCCATGAGCTTTCTGAATGGCACAAACGCTCTGCCT 
ATCTTTGCCGTCTGCCAATGCAGAAGGAAAAACTATGGCCAACTGGA 

GTTTTTCCTCGCCATGCTGGTCCTGATGGCCGTCCTGTATGTGCATATGCTCGCCAGAGCCTGTCAGCATGCCCAAGGCATTC 




kCTTTAGCTTTAGGAATACCC 
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CCTTCAGAAACACAGTGGAAGGCTATAGCGATCCCACAGGCAAATACGAT^ 

GCCACAGGCAAAAACGTCTGCGATATCTGTACCGATGACCTCATGGGAAGCAGAAGCAATTTCGATAGCACAATCACAGACCAAGTC 

GTCCX5TGTCCCAGCTCAGGGCTCTGGATGGCGGAAACAAACACTTTCTGAGAAACCAACCCCTCT 

TOSAGGATCCCTCCACCGATTACTATCAGGAACTGCAAAGGGATATCrCCGAGATGAGCCCITACGAAAAGGTCA 

CTGTCCTACACAAACCCTGCCGTCGCCGCTGCCTCCGCCAATCTGGCTTACGTCATCCCTATCXMAACCTATGGCCAAATGAAA 

CATGTTCAATGACATTAACATTTACGATCTK5TTTGTGTGGAGGCTCTG 

GCCGTCCCCTCCGGCGAAGGCGATCACCATGCCTTTGTGGATAGCATTTTCCAACAGTGGCTGCAAAGGCATAGGCCT 
GGCTAACGCTCCCATTTGCACAGACGATCTGATGGGCTCCAGGTCCAACTTTGACTCCACC 

TCGTCTGTTTCCCTGCCAGAGACACATACCATCCCATGAGCGAATACCCTACCTATCACACACACGGAAGGTATGTGCCTCCC 
AGCTATACCAATCCCXSCTGTWCTGCCGCTAGCGCTAACCrCGCCGCTGAGCATCCCACATGCGG 

cctcatcatttgcaatcccattatcgatcccctcatctatatgtcccaggtccagggaagcgctaacgatcccatt^ 

tcgactccatctttgagcaatggctccagagaaggaatagcatgaagctccccacactgaaagacattagggatt 

aaccctccctttttccaaaactccctgattagcagagccctcgtggtcacccatacctatctggaacccggacccgt 

ggctgccattcccctcagctttgccctcccctattggaatttosctaccggaaggaatgagtg 

gacccgatgtgattggcgctctgctcgccgtcggcgctaccaaagtgcctaggaatc^ggatt^ 

gccctcgacggaggcaataagcatttcctcaggaatcagcctctgacattcgctc 

gtgtaccgatcagctcttcggagccgctaggcctgacgatccgacactgattagcagaaactccaggtttagctcctggg 

aagacgctcactttatctatggctatcccaaaaagggacacggacactcctacacaaccgctgaggaagccgctaccgga^ 

aggtccctg cat aacctcg cccatctgtttctg aatgg cacagg cgg acag acacacctcagctcc agcg atgtgtccgtgtccg acgtcccctttcc 

ctttagcgctcagtccggcgctggcgtccccggatggggaatcgctctgctcgtgctcagccctcaggaaagggaac 

tcgccaaaaagagagtgcatcccgattacgtcatcacaacccaacactggttctttgcctatctgacac 

gtgattcccattggcacatacggacagatgaagaatggctccggcacaaacgtcctggaaaccgctgtg 

ggctagggctgccgtcctgcaacagctcgacaatgtgtgtgtgctcgtggctctggctatcgtctacctcatcgct 

gaaagaattacggacagctcgacatttgccctcaggaaggctttgaccatagggatagcaaagtc^ 

cccaatgcccctcccgctagcgtcctgtccagccatagccctggctccggctccagcacaacccaaggccaagacgtcaccctcgcccct 
gcctgcctccgacx;aatggatgaagagattcaatccccctgccgatgcctggccccaagagctcgcccctatcw 

tcg cctttc actcc caggaactgagaaggacactgaaagaggtcctg a catgctcctggg ctgccttctccc accaaggccctgcctttgtgacatgg 

cataggtatcacctcctgtgtctggaaagggatctgcaaaggctcatcggaaacgaaaggcctatggtccagagactc 

tcagtgtctggaagtgggactgtttgacacaccccctttctatagcaatcaggatccca 

GAAGTGAGAGCCGATACCAGACCCTGGAGCXK5ATACAATTGCGGAGACTCTAAGTTTGGCT 

CATCAGACAGAATATCCATAGCCTCCACCTCTTCCTCAACGGAACCGGAGGCCAAACCCATCTGTCCAGCCAAGACCCTATCTTTGTC 

CCTTTACCGATGCCGTCGGCCTCGTGTCCCTGCTCTGCAGACACAAAAGGAAACAGCTCCCCGAAGAGAAACAG 

TATCACTCCGGCTGTAAGATTCTGCCTGGCGCrTCAGGGACAGTTT 

ACAGCTCCCCCATAGCTCCAGCC^TTGGCTCAGGCTCCGCAGAATCTTTTGCTCCTGCCCTATCGGAGAGAATAGCCCT 
GCCCTTGCCCTAGCGGAAGCTGGAGCCAAAAGAGAAGCrTTGTGTATGTGTGGAAGACATGGGGACAGTATTGGCAAG 

GGAGTGTCCAGGCAACTGAGAACCAAAGCCTGGAACAGACAGCTCTACCCTGAGTGGACCGAAGCCCAAAGGCTCATCTGGAGGGATATCGA 

TCACGAAGCCCCrGCCTTTCTGCCTTGGCATAGGCTCTTCCTCCTGAGATGGGAACAGGAAATCCAA 

TCTTTCTGCTCCTGCTCCTGACAGTGCTCACCGTCGTGACAGGCTCCGGCCATGCCTCCGACAAAAGCCTCCACGT^ 

agaaggtgtccccaagagggattcgatcacagagactccaaggtcagcctcaacgtcacctccgcctccggctccccctcc^ 
cgtgcataacggaacctcrcccagagccacaaccacacccgctctggaaggctttgcctcccccctcaccggaatcgctgacgctag 
tgcataacgctctgcatatctatatg aatgg c^c^gggatagccrcctgggacccggaagggcttacagagccattgactttagccatcagggaccc 
gctttcgtcacctggcacagataccatgccgctatgctcctggcrgtgctctactgtct 

agcctgtgtgtccagcaaagcctttgagctcaccgtcagctgtcagggaggcctccccaaagaggcttgcatggagattagctcccccggatc 

cccctgcccaagtggatgacagagagtcctggccragcgtcttctataacagaacctgtcagtgtagcggaaactttatggga 

tgtgagtccgccgaaatcctccaggctgtgcctagcggagagggagacgctttcgaactgacagtgt^ 

ctgcgatagcctcgacgattacaatcacctcgtgacactgtgtaacggaacctatgagggactgctcag 

cccctctgggaccccaattccctttcacaggcgtcgacxsatagggaaagcrggccctccgtgm 

caaagccctcccccttactcccccgctgccatcgtcgccattctgcrcgtgctcatggctgtggtcctg^ 

gaaacaggatttctccgtgcctggcattggcattctgacagtgattctgggagtgctcctgctcatcggatgctggtactgt 

atagggctctgatgtactcctacctccaggatagcgatcccgatagctttcaggattacat^ 

tggctccagggacaggatgtgacactggctcccgctaccgaacccgctagcggaagcgctgccacatggggacaggatgtgacaagcgtcc 
ctcctgcgcaagctcccccgtccccggaaccacagacxmacacagacccacagccgaagcccctaacacaaccgctggccaagtgcctctggtcca^ 
atggcacaagcgctagggctaccacaacccctgcctccaagtccaccccittctccatccctagccatcactccgacacagagga 
cccacaaccctcctggtcgtgatgggcacactggtcgccctggtgggacrctttgtgct 

cgctagggaactgcctatccctgagcctgagggacccgatccctccagcattatgtccaccgaaggcgctgtgacactgaca 
ttttcctctgctggggccctttctttctgcatctgacactgattgtgct 
gccggactggtcagcctcctgtgtaggcataagagaaagcaactgcctggcg ctct 

cgatgtgattacctgtagctccatgctcagctccctgtgtatgacacccgaaaaggtccccgtcagcgaagtgatgg 

ccacccctgaggctaccggaatgacacccgctcagacaagcgctggccatttccctagggcttgggtcagctccaagaatctgatg 

tgccctccctggagcggagacagagccctcaggtatcactccatcgtcaccctccccagagccccragggctgtggctgccatttgggttrg 

ggtcttctccaccctcttcagagagggaaccattaacgtccacgatgtggaaacccaat^ 

tcaccattaacctcatggaaaaggaatgctgtcccccttggtccggcgataggtccccctgtggcgaactgt^ 

atcaaaag ctatctgg aacaggct agcag aat ctgg agctggctg ctcggcgctgccatggtgggagccgtcctg acagccctcctggctcccac aac 

cctcgcctcccactccacc^aaaccgatgccrccagc^cacaccatagctccgtgcctcccctc^ccrccagcaatcact 

ctggctggggcattgccctccixk;tcckx;tctgcgtcctggtcgccctcgccattgtgtatctc 

aggtatatctccatcitttacgctctgagataccatagcattgtgacactgcctagggctcccagactgattatg 

ccaagtgcctctgattgtgggaatcctcctggtcctgatggccgtcgtgctcgcctccaccct 

ttctgcaatacagaaggctcaggaaaggctatacccctctgatggagacactgatta 

agcctcgaggattacgataccctcx3gcac*ctgtgtaactccctgaatagcacacccacagcc7utccc^ 

cgctaggtgtctggaagtgtccatctccgacggacacagacccctccaggaagtgtatcccgaagccaatgcccctatcggacacaatagggaaagct 
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AT ATGGTCCCCTTTATCCCTCTGT ATG AGCCT AG CGG AACCACAAGCGTCCAGGTCCCCACAACCG AAGTG ATT AGCACAG CCCCTG TGCAAATGC CT 

ACCGCTGAGTCCACCGGAAAGAAAAGGGTCCACCCTGACTATGTGATTACCACACAGCATTGGCTaSGCCTCCTGGGACCC 

GTTTGCCAATCTGCATCAGATTCTGAAAGGCGGAAGCGGAACCTATTGCCTCAACGT 

AACCCG AACCCG AAGGCCCTG ACG CT AGCTCCATCATGAG CACAGAGTCCATCACAGGCTCCCTGGG ACCCCTCCTGGATGGCACAGCCACAAG CATT 

ACCGGAAGCCTCGGCCCTCTGCTCGACGGAACCGCTACCCTCAGGCTCGTGAAAAGGCAAGT^ 

CGGACAGGTCAGCCTCAAGGTCAGCAATGACGGACCCACACTGATTGGCGCTAAC^ 

Melanoma cancer Specific Savine Scramble process 



0.1 beta, 08/02/1999 

10 

121 

30 

IS 



Scramble - Output File 

Scramble version 
Num . genes 
Num. segments 
Segment length 
Segment overlap 

Segments in original order: 

Gene : BAGE 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AAMAARAVFLALSAQLLQARLMKEESPVVS 
GCCGCTATGGCTGCCAGAGCCGTCTTCCTCGCCCTCAGCGCTCAGCTCCT^ 



Gene : BAGE 

Segments : 2 
Offset : 16 
1st Codon : 1 

LLQARLMKEESPVVSWRLEPEDGTALCFIF 
CTGCTCCAGGCTAC^CTCATGAAAGAGGAAAGCCCTGTGGTCAGCTGGAGGCT 



Gene : BAGE 

Segment^ : 3 y - 
Offset : 31 
1st Codon : 1 

WRLEPEDGTALCFIFAA 
TGGAGACTGGAACCCGAAGACGGAACCGCTCTGTGTTTCATTTTCGCTGCC 



Gene : GAGE- 1 . 

Segment # : 1 
Offset : 1 
1st Codon : 1 

AAMSWRGRSTY RPRPRRYVEP PEM IGPMRP 
GCCGCTATGTCCTGGAGAGGC^GAAGCACATACAGACCCAGACCCAGAAGGTATGTGGAACC^ 



Gene : GAGE - 1 

Segment # : 2 
Offset : 16 
1st Codon : 1 

RRYVEPPEMIG PMRPEQFSDEVE PATPEEG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGAGACCCGAACAGTTTAGCGATGAGGTCGAGCCTGCCAC^CCCGAAGAGGGA 



Gene : GAGE - 1 

Segments : 3 
Offset : 31 
1st Codon : 1 

EQFSDEVEPATPEEGEPATQRQD PAAAQEG 
GAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAGGCGAACCCGCTACCCAAAGGCAAGACCCTGCCGCTGCCCAAGAGGGA 



Gene : GAGE-1 

Segment U : 4 
Offset : 46 

1st Codon : 1 

EPATQRQDPAAAQEGEDEGASAGQGPKPEA 
GAGCCTGCCACACAGAGACAGGATCCCGCTGCCGCTCAGGAAGGCGAAGACXSAAGGCGCTAGCGCTC 



Gene 
Segments 
Offset 
1st Codon 



GAGE-1 
5 

61 

1 



Figure 27 (Cont) 



WO 01/090197 PCT/AU01/00622 



184/216 



EDEGA SAGQGPKP EADSQEQGHPQTGCECE 
GAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGCCGATAGCCAAGAGCAAGGCCATCCCCAAACCGGATGCGAATGCGAA. 



Gene : GAGE-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

DSQEQGHPQTGCE CEDGPDGQEMDPPN PEE 
GACTCCCAGGAACAGGGAC^CCCTCAGAC^GGCTGTGAGTGTGAGGATGGCCCTGACGGACAGGAAATC 



Gene 
Segment^ 
Offset 
1st Codon 
D G P 



GAGE-1 
7 

91 
1 

D G Q E M 



R S H Y 



A Q 



GACGGACCCGATGGCC^GAGATGGACCCTCCCAATCCCGAAGAGGTC^GACACCCGAAGAGGAAATGAGAAGCCATTACGTCGCCCAA 



Gene : GAGE-1 

Segments : 6 
Offset : 106 

1st Codon : 1 

VKTPEEEMRSHYVAQTGILWLLMNNCFLNL 
GTGAAAACCCCTGAGGAAGAGATGAGGTCCCACTATGTGGCTCAGACAGGCATTCTGTGGCTGCT 



Gene 
Segment S 
Offset 
1st Codon 

TGI 



GAGE-1 
9 

121 
1 

L W h L M N N 



ACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTG 



Gene : gpl00In4 

Segments : 1 
Offset : 1 
1st Codon : 1 

AASWSQKRSFVYVWKTW GEGLPSQPI IHTC 
GCCGCTAGCTC^AGCO^AAAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGAGAGGGACTC^ 



Gene : gpl00In4 

Segments : 2 
Offset : 16 
1st Codon : 1 

TWGEGLP SQPI IHTCVYFFLPDHLSFGRPF 
ACCTGGGGCGAAGGCCTCCCCTCCCAGCCTATCATTCACACATGCGTCTACTTTTTCCT 



Gene 

Segments 
Offset 
1st Codon 
V Y F 



gplOOIn4 
3 

31 
1 

F L P D H 



F G R 



H 



N 



GTGTATTTCTTTCTGCCTCACCATCTGTCCTTCGGAAGGCCTTTCCATCrGAATTTCTGTGA 



Gene 
Segments 
Offset 
1st Codon 
A A M 



MAGE-1 

1 

1 

1 

S L E Q 



L H C 



A Q O E A L 



GCCGCTATGTCCCTGGAACAGAGAAGCCTCCACTGTAAGCCTGAGX3AAGCCCT 



Gene 
Segments 
Offset 
1st Codon 
E A L E 



MAGE-1 
2 

16 
1 

A O Q 



E A L G 



Q A A 



S S S 



L -G 



GAGGCTCTGGAAGCCCAACAGGAAGCCCTCGGCCTCGTGTGTGTGCAAGCCGCTACCTCCAG 



Gene 

Segment S 
Offset 
1st Codon 
Q A A 



MAGE-1 
3 

31 
1 

S s s s 



LEE 



TAGS 



P g 
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Gene 

Segments 
Offset 
1st Codon 
E E V 



MAGE- 1 
4 

46 
1 

PTAGSTDPPQS 



R 0 



GAGGAAGTGCCTACCGCTGGCTCCACCGATCCCCCTCAGTCCCCCCAAGGCGCT^ 



Gene 

Segment* 
Offset 
1st Codon 
Q G A S 



MAGE-1 
5 

61 
1 

A F P T T I 



G S 



E E E G 



CAGGG AGCCTCCG CCTTTCCCACAACCATTAACTTTACCAG ACAGAG ACAGCCT AG CG AAGGCTCCAGCTCCAGGGAAGAGG AAGGCCCT 



Gene : MAGE-1 

Segments : 6 
Offset : 76 
1st Codon : 1 

RQPSEGSSSREEEGPSTSCI LESLFRAVIT 
AGGCAACCCTCCGAGGG AAG CTCCAGC AG AGAGG AAG AGGGACCCTC CACCTC CTGCATTCTGGAAAGCCTCTT CAGAGCCGTCATCAC A 



Gene 

Segment S 
Offset 
1 st Codon 
S T 



MAGE-1 
7 

91 
1 

C I L E S 



L L L 



AGCACAAGCTGTATCCTCGAGTCCCTGTTTAGGGCTGTGATTACCAAAAAGGTCG 



Gene : MAGE-1 

Segments : 8 
Offset : 106 
1st Codon : 1 

KKVADLVGFLLLKYRAREPVTKAEMLESVI 
AAGAAAGTGGCTGACCTCGTGGGATTCCTCCTGCTCAAGTATAGGGCT 



Gene 

Segment S 
Offset 
1st Codon 
A R E P 



MAGE-1 
9 

121 
1 

V T K 



K H C F 



GCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGAAAGCGTCATCAAAAACTATAAGCATTG<^ 



Gene : MAGE-1 

Segments : 10 
Offset : 136 
1st Codon : 1 

KNYKHCFPEIFGKASESLQLVFGI DVKEAD 

ATTTTCGGAAAGGCTAGCGAAAGCCTCCAGCTCGTGTTTGGCAT*TGACGTCAAGGAAGCCGAT 



Gene : MAGE-1 

Segments : 11 
Offset : 151 
1st Codon : 1 

ES L.QLVFG IDVKEAD PTGHS YVLVTCLGLS 
GAGTCCCTGCAACTGGTCTTCGGAATOSATGTGAAAGAGGCTGACCCTACC^ 



Gene 

Segments 
Offset 
1st Codon 
P T G 



MAGE-1 
12 
166 
1 

H S Y V 



L G 



D G 



G D N Q 



T G 



CCCACAGGCCATAGCTATGTGCTCGTGACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGCGATAACCAAA 



Gene : MAGE-1 

Segments : 13 
Offset : 161 
1st Codon : 1 

YDGLLGDNQI MPKTGFLIIVLVMIAMEGGH 
TACGATGGCCTCCTGGGAGACAATCAGATTATGCCrrAAGACAGGCTTTCTC 



Gene 



: MAGE-1 
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Segments 
Offset 



14 
196 



1st Codon : 1 

FLI IVLVMI AMEGGHAPEEEI WEELSVMEV 
TTCCTCATCATTGTGCTCGTGATGATCC^TATGGAAGGCGGACACGCTCCCGAAGAGGAAATCTC 



Segments : 15 
Offset : 211 

1st Codon : 1 

APEEE I WEELSVMEVYDGREHSAYGEPRKL 
GCCCCTG AGG AAG AG ATTTGGGAAGAG CTC AGCGTCATGGAAGTGTATG ACGG AAGGG AACACT^ CG AACCCAG AAAGCTC 

Gene : MAGE- 1 

Segments : 16 
Offset : 226 

1st Codon : l 

YDCREHSAYGEPRKLLTQDLVQEKYLEYRQ 
TACGATGGCAGAGAGCATAGCGCTTACC-GAGAGCCTAGGAAACTGCTCACCCAAGACCTCGTC 

Gene : MAGE- 1 

Segment^ : 17 
Offset : 241 

1st Codon : 1 

LTQ DLVQEKYLEYRQVPDSDPA RYE FLWG P 
CTGACACAGGATCTGGTCCAGGAAAAGTATCTGGAATACAGACAGGTCCCCGATAGC^ 

Gene : MAGE- 1 

Segment* : 18 
Offset : 2S6 
1st Codon : 1 

VPDSD PARYEFLWGPRALAETSYVKVLEYV 
GTGCCTGACTCCGACCCrGCCAGATACGAATTCCTCTGGGGACCCAG^ 

Gene : MAGE - 1 

Segment* : 19 
Offset : 271 

1st Codon : 1 

RALAETSYVKVLEYVIKVSARVRFF FPSLR 
AGGGCTCTGGCTGAGACAAGCTATGTGAAAGTGCTCGAGTATGTGATTA^ 

Gene : MAGE- 1 

Segments : 20 
Offset : 286 

1st Codon : 1 

I KV SA RVR F F F PS LREAAL R E E E EG VAA 
ATCAAAG TG TCCGCCAG AG TG AG ATT CTTTTT CCCT AG CCTCAGGG AAGCCG CT CTGAG AG AGG AA G AGG AAGGCG TCGCCG CT 

Gene : MAGE -3 

Segments : 1 
Offset : 1 
1st Codon : 1 

A A M PL EQRSQHCKPEEGLEARG EA. L G L V G A 
GCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAGCCTGAGGAAGGCCTC^ 

Gene : MAGE- 3 

Segment* : 2 
Offset : 16 
1st Codon : 1 

EGLEARGEALGLVGAQAPATEEQEAASSSS 
GAGGGACTCGAAGCCAGAGGCGAAGCCCTCGGCCTCGTGGGAGCCCAAGCC^ 

Gene : MAGE -3 

Segment* : 3 

Offset : 31 

1st Codon : 1 



QAPAT EEO.EAASSSSTLVEV TLGEV PAAES 
CAGGCTCC<X»CTACCGAAGAGCAAGAGGCTGCCTCCAGCTCCAGCACACTG 



Gene 



MAGE- 1 



Gene 

Segment* 

Offset 



MAGE -3 
4 

46 
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1st Codon : 1 

TLVEVTLGEVPAAES PD PPQSPQGASS LPT 
ACCCTCGTGGAAGTGACACTGGGAGAGGTCCCCGCTGCCGAAAGCCCTGACCCTCCCCAAA 



Gene : MAGE -3 

Segment # : S 
Offset : 61 

1st Codon : 1 

PD PPQSP QGASSL PTTMNY PL.WSQSYEDSS 
CCCGATCCCCCTC^GTCCCCCCAAGGCGCTAGCTCCCTGCCTACCACAATGAATTACCCTCTGTGGA 



Gene : MAGE- 3 

Segment^ : 6 
Offset : 76 

1st Codon : 1 

TMNYPLWSQSYEDSSNQEEEGPSTFPDLES 
ACC^TGAACTATCCCCTCTGGTCCCAGTCCTACGAAG^ 

Gene : MAGE- 3 

Segment # -• 7 
Offset : 91 
1st Codon : 1 

NQEEEGPSTFPDLE S EFQAALSRKVAE LVH 
AACCMGAGGAAGAGGGACCCTCCACCTTTCCCGATCTGGAAAGCGAATTCCAAGCCGCTCrc 

Gene : MAGE -3 

Segments : 8 
Offset : 106 
1st Codon : 1 

EFQAALSRKVAELVH FLLLKYRAREPVTKA 
G AGTTTCAGGCTGCCCTCAGCAGAAAGGTCGCCG AACTGGTCC ACTTTCTG CTCCTG AAAT ACAG AGCCAG AGAG CCTGTG ACAAAGGCT 

Gene : MAGE -3 

Segment # : 9 

Offset : 121 y 

1st Codon : 1 

FLLLKYRAREPVTKAEMLGSVVGNWQY FFP 
TTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAAATC 

Gene : MAGE -3 

Segment* : 10 
Offset : 136 
1st Codon : 1 

EMLGSVVGNWQYF FPVI FS KASSSLQLVFG 
GAGATGCTGGGAAGCGTCGTGGGAAACTGGCAGTATTTCTTTCCC^ 

Gene : MAGE -3 

Segment^ : 11 
Offset : 151 

1st Codon : 1 

VIFSKASSSLQI»VFGIEI»MEVDPIGHLYI F 
GTGATTTTCTCCAAGGCTAGCTCCAGCCTCCAGCTCGTGTTTGGCAT^ 

Gene : MAGE -3 

Segment* : 12 
Offset : 166 

1st Codon : 1 

I ELMEVDPIGHLYI FATCLGLSYDGLLGDN 
ATCGAACTGATGGAGGTCGACCCTATCGGACACCrrCTACATTTTCGCTACCTGTCTGG 

Gene : MAGE -3 

Segment^ : 13 
Offset : 181 
1st Codon : 1 

ATC LGL.SYDGLLGDNQI MPKAGLLI I V L A I 
GCCACATG CCTCGGC CTC AGCTATG ACGGACTGCTCGGCG ATAACCAAATC ATGCCC AAAG CCGG ACTGCTC ATCATTGTG CTCGCCATT 

Gene : MAGE -3 

Segment* : 14 

Offset : 196 

1st Codon : 1 

Q I M PKAGLLI I VLAI IAREGDCAPEEKIWE 
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CAGATTATGCCTAAGGCTGGCCTCCTGATTATCGTC 

Gene : MAGE -3 

Segment # : 15 
Offset : 211 
1st Codon : 1 

IAREGDCAPE EK IWEELSVLEVFEGREDSI 
ATCGCTAGGGAAGGCGATTGCGCTCCCGAAGAGAAAATCTTOGAGGAACTG 

Gene : MAGE -3 

Segment* : 16 

Offset : 226 » 

1st Codon : 1 

ELS VLB V F EG R E DS I LGD PK K L L TQH FVQ E 
GAGCTCAGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTCCATCCTCGGCGATCCC 

Gene : MAGE -3 

Segment** : 17 
Offset : 241 

1st Codon : 1 

LGDPKKLLTQHFVQENYLEYRQVPGSDPAC 
CTGGGAGACCCTAAGAAACTGCTCACCGAACACTTTGTGCAAGA 

Gene : MAGE- 3 

Segments : 18 

Offset : 256 

1st Codon : 1 

NYLEYR QV PG S D PACY.EFLWG P RALVE T S Y 
AACTATCTGGAATACAGACAGGTCCCCGGAAGCGATCCCGCTTGCTATGAG 

Gene : MAGE- 3 

Segments : 19 
Offset : 2-71 
1st Codon : 1 

YEFLWG PRALVETSYVKVLHHMVKI SGG PH 
TACGAATTCCTCTGGGGACCCAGAGCCCTCGTGGAAACCTCCTACGTO^GGTCCT 

Gene : MAGE- 3 

Segments : 20 
Offset : 286 
1st Codon : 1 

VKVLHHM VKI SGGPHISYPPLHEWVLREGE 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTCACATTAGCTATCCCCCT 

Gene : MAGE -3 

Segment^ : 21 
Offset : 301 
1st Codon ; 1 

ISY PPLHEWVLREGEEAA 
ATCTCCTACCCTCCCCTCCACGAATGGGTCCTGAGAGAGGGAGAGGAAGCCGCT 

Gene : PRAME 

Segment* : 1 
Offset : l 
1st Codon : 1 

AAMERRRLWG S I QSRYI SMSVWTS PRR LVE 
GCCGCTATGGAAAGGAGAAGGCTCTGGG43AAGCATTCAGTCCAGGTATATCTCCATC 

Gene : PRAME 

Segments : 2 
Offset : 16 

1st Codon : 1 

YISMSVWTSPRRLVELAGQSLLKDEALAIA 
TACATTAGCATGAGCGTCTGGACAAGCCCTAC^AGACTGGTCGAGCTCGCCGGACAGTCCCTGCTCAAGGATGAGGCTCTG 

Gene : PRAME 

Segments : 3 
Offset : 31 
1st Codon : 1 

LAGQSLLKDEALAIAALELLPRELFPPLFM 
CTTOCrGGCCAAAGCCrCCTGAAAGACGAAGCCCTC^ 
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Gene : PRAME 

Segment^ : 4 
Offset : 46 
1st Codon : 1 

ALELLPRELFPPLFMAAFDGRHSQTLKAMV 
GCCCrCXSAGCTCCreCCTAGGGAACTGTTTCCCCCTCTGTTTATGGCTGC 



Gene 
Segments 
Offset 
1st Codon 
A A F 



PRAME 

5 

61 
1 

D G R 



H S 0 



K 



M 



M K 



GCCGCTTTCGATGGCAGACACTCCCAGACACTCAAAGCCATGGTGCAAGCCTG 



Gene : PRAME 

Segments : 6 
Offset : 76 

1st Codon : 1 

QAWPFTCLPLGVLMKGQHLHLETFKAVLDG 
CAGGCTTGX5CCTTTCACATGCCTCCCCCTCGGCGTCCTGATGAAGGG 



Gene : PRAME 

Segments : 7 
Offset : 91 
1st Codon : 1 

GQHLHLETFKAV LDGLDVLLAQEVRPRRWK 
GGCCAACACCTCCACCTCGAGACATTCAAAGCCX3TCCTGGATGGCCTCGACGTCCTGC^ 



Gene 
Segments 
Offset 
1st Codon 
L D v 



PRAME 
8 

106 
1 

L L A Q 



CTGGATGTGCTCCTGGCTC^GGAAGTGAGACCCAGAAGGTGGA^ 



Gene : PRAME 

Segments : 9 
Offset : 121 

1st Codon : 1 

LQVLDLRKNSH QDFWTVWSGNRASLYSF PE 
CTGCAAGTGCTCGACCTCAGGAAAAACTCCCACCAAGACTTTTGGACAGTGTGGAGCGGAAACAGAGCCTCC 



Gene 
Segments 
Offset 
1st Codon 
T V W 



PRAME 
10 
136 
1 

S G N 



ACCGTCTGGTCCGGCAATAC^GGCTAGCCTCTACrCCTTCCCTGAGCCT 



Gene : PRAME 

Segments : 11 
Offset : 151 
1st Codon : 1 

PEAAQ PMTKKRKVDGLSTEAEQPFI PVEVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGGCCTCAGCACA^ 



Gene : PRAME 

Segments : 12 
Offset : 166 
1st Codon : 1 

LSTEAEQPFI PVEVLVDLFLKEGACDELFS 
CTGTCCACCGAAGCCGAACAGCCTTTCATTCCCGTCGAGGTCCTGGTCGACCT 



Gene 
Segments 
Offset 
1st Codon 
V D L F 



PRAME 
13 
181 
1 

L K 



E G A C D E 



R L 



Gene : PRAME 

Segments : 14 
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Offset : 196 

1st Codon : 1 

YLIEKVKRKKNVLRLCCKKLKI FAMPMQDI 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGACTGTGTTGCAAAAAGCTCAAGATTTTCGCTATGCCTATGCAAGACATT 

Gene : prame 

Segment S : 15 
Offset : 211 
1st Codon : 1 

CCKKLKI FAMPMQDI KMI LKMVQliDSI EDL 
TGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAGGATATCAAAATGATC 

k 

Gene : PRAME 

Segments : 16 
Offset : 226 

1st Codon : 1 

KMI LKMV QLDSIEDLEVTCTWKLPTLAKFS 
AAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAGGATCTGGAAGTGACATGC^ 

Gene : PRAME 

Segment tt : 17 
Offset : 241 

1st Codon : 1 

EVTCTWKLPTLAKFS PYLGQMI NLRRLL»L>S 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTC^CTAAGTTTAGCCCTTACCTCGGCCAAATGATTAACCTC^ 

Gene : PRAME 

Segments : 18 
Offset : 256 

1st Codon : 1 

PYLGQMINLRRLLLSHIHA SSYISPEKEEQ 
CCCTATCTGGGACAGATGATCAATCTGAGAAGGCTCCTGCTCAGCCATATCCATGCCTCCAGCTATATCT 

Gene : PRAME 

Segments : 19 
Offset : 271 

1st Codon : 1 

HIHASSYI SPEKEEQYIAQFTSQFLSL QCL 
(^CAT^CACGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAACAGTATATCGCTCAGTTTACCTCCCA 

Gene : PRAME 

Segments : 20 
Offset : 2B6 

1st Codon : 1 

YIAQFTSQFLSLQCLQALYVDS LFFLRGRL 
TACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCCAGTGTCTGCAAGCCCIXTTACGTCGACT 

Gene : PRAME 

Segment^ : 21 
Offset : 301 

1st Codon : 1 

QALYVDSLFFLRGRLDQLLRHV MNPLETL S 
CAGGCTCTGTATGTGGATAGCCTCTTCTTTCTGAGAGGCAGACTGGATCAGCTCCTGAGAC^CGTCATGAATCCCCTCGAGACACTGTCC 

Gene : PRAME 

Segments : 22 
Offset : 316 

1st Codon : 1 

DQLLRHVMNPLETLSITNCRLSEGDVMHLS 
GACCAACTGCTCAGGCATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAA 

Gene : PRAME 

Segments : 23 
Offset : 331 
1st Codon : 1 

ITNCRLSEGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAAGCCGATGTGATGCACCTCAGCCAAAGCCCTAGC^^ 

Gene : PRAME 

Segments : 24 

Offset : 346 

1st Codon : 1 
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QS PSVSQLSVLSLSGVMLTDVS PEPLQ ALL 
CAGTCCCCCTCCGTGTCCCAGCTCAGCGTCCTGTCCCTGTCCGGCGTCATGCTCACCGATGTGTCCCCCGAACCCCTCCAGGCTCTGCTC 

Gene : PRAME 

Segments : 25 
Offset r 361 

1st Codon : 1 

VMLTDVSP EPLQALLERASATLQDLVFDEC 
GTGATGCTGACAGACGTCAGCCCTGAGCCTCTGC^GCCCTCCTGGAAAGGGCTAGCGCT 

Gene : PRAME 

Segment^ : 26 
Offset : 376 
1st Codon : 1 

ERASATLQDLVFDECG ITDDQLLALLPSLS 
GAGAGAGCCTCCGCCACACTGCAAGACCTCGTGTTTGACGAATGCGGAATCACAGACGATCAGCTCCTGGCTCTGCT 

Gene : PRAME 

Segment^ : 27 
Offset : 391 
1st Codon : 1 

GITDDQLLALLPSLSHCSQLTTLSFYGNS I 
GGCATTACCGATGACCAACTGCTCGCCCTCCTGCCTAGCCTCAGCCATTGCTCCCAGCTC^ 

Gene : PRAME 

Segments : 28 

Offset : 406 

1st Codon : 1 

HCSQLTTLSFYGNSI S ISALQSLLQHLI GL 
CACTGTAGCCAACTGACAACCCTCAGCTTTTACGGAAACrrCCATCTCCATCTCCGCCCTCCAGT 

Gene : PRAME 

Segments : 29 

Offset : 421 

1st Codon : 1 s , 

S I SALQSLLQHL IGLSNLTHVLYPVPLESY 
AGCATTAGCGCTCTGCAAAGCCTCCTGCAACACCTCATCGGACTGTCCA^ 

Gene : PRAME 

Segments : 30 
Offset : 436 
1st Codon : 1 

SNLTHVLYPVPLESYEDIHGTLHLERLAYL 
AGCAATCTGACACACGTCCTC^ATCCCGTCCCCCTCGAGTCCTACGA 

Gene : PRAME 

Segments : 31 
Offset : 451 
1st Codon : 1 

EDIHGTLHLERLAYLHARLRELLCELGRPS 
GAGGATATCCATGGCACACTGCATCTGGAAAGGCTrcCCTATCTC 

Gene : PRAME 

Segments : 32 
Offset : 466 
1st Codon : 1 

HARLRELLCELGRPSMVWLSANPCPHCGDR 
CACGCTAG^CTCAGC^AACTGCTCTGCGAACTGGGAAGGCCTAGCAT^ 

Gene : PRAME 

Segments : 33 
Offset : 481 

1st Codon : 1 

MVWLSAN PCPHCGDRTFYD PEPI LCPCFMP 
ATGGTCTGGCTCAGCGCTAACCCTTGCCCTCACTGT^ 

Gene : PRAME 

Segments : 34 
Offset : 496 

1st Codon : 1 

TFYDPEPI LCPCFMPNAA 
ACCTTTTACGATCCCGAACCCATTCTGTGTCCCTGTTTCATGCCCAATGCCGCT 
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Gene : TRP2IN2 

Segment^ : 1 
Offset : 1 
1st Codon : 1 

AALMETHLSSKRYTEEAGGFFPWLKVYYYR 
GCCGCTCTGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAAGCCGGAGGCTTC 

Gene : TRP2IN2 

segment^ : 2 
Offset : 16 

1st Codon : 1 

EAG GFFPWLKVYYYR FV IGLRVWQWEVI SC 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAGGTTTGTGA 

Gene : TRP2IN2 

Segment* : 3 
Offset : 31 
1st Codon : 1 

FVI GLRVWQWEVI SC KLIKRATTRQPAA 
TTCGTCATCGGACTGAGAGTGTGGCAGTGGGAGGTCATCTCCTGCAAACTGAT^ 

Gene : NYNSOla 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMQAEGRGTGGS TGDADG PGGPG I PDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGCACAGGCX3ATGCCGATGGCCCTGGCG 

Gene : NYNSOla 

Segments : 2 
Offset : 16 

1st Codon : 1 t 

DADG PGG P G II P DG PGGNAG G PGEAGATGG R 
GACGCTGACGGACCCGCAGGCCCTGGCATCCCCGATGGCCCTGGCGGAAACGCTGG^ 

s ' I 
Gene : NYNSOla 

Segments : 3 
Offset : 31 
1st Codon : 1 

GNAGG PG EA GA TG G RG P RG A <3 AA RA S G PG G 
G<3CAATGCCGGAGGCCCTGGCGAAGCCGGAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGGCGCTC 

Gene : NYNSOla 

Segments : 4 
Offset : 46 
1st Codon : 1 

G PRGAGAARASGPG-GGA PRG PHGGAASGLN 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGC<X5ACTGAAT 

Gene : NYNSOla 

Segments : 5 
Offset : 61 
1st Codon : 1 

GAPRGPHGGAASGLNGCCRCGARGPESRLL 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGGCCTCAACGGATGCTGTAGGTGTGGCGCrrAGGGGACCCGAAA 

Gene : NYNSOla 

Segments : 6 
Offset : 76 
1st Codon : 1 

GCCRCGARG PESRLLEFYLAMPFATPMEAE 
aSCTGTTGCAGATGCGGAGCCAGAGGCCCTGAGTCCAGGCTCCTGGAATTCTATCTGGCTATGCCT 

Gene : NYNSOla 

Segment S : 7 
Offset : 91 

1st Codon : 1 

EFYLAMPFATPMEAELARRSLAQDAPPLPV 
GAGTTTTACCTCGCC^TGCCCTTTGCCACACCCATGGA 

Gene : NYNSOla 
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Segment S : 8 
Offset : 106 
1st Codon : 1 

LARRSLAQDAPPLPVPGVLLKEFTVSGNI L 
CTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCCTCTGCCTGTGCCTGGCGTC 

Gene : NYNSOla 

Segments : 9 
Offset : 121 

1st Codon : 1 

PGVLLKEFTVSGNILTI RLTAADHRQLQLS 
CCCC^GTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACA^ 

Gene : NYNSOla 

Segments : 10 
Offset : 136 

1st Codon : 1 

TIRLTAADHRQLQLS ISSCLQQLSLLMWIT 
ACCATTAGGCTCACCGCrrGCCGATCACAGACAGCTCCAGCTCAGCATTAGCTCCTGCCTCCAGCAACT 

Gene : NYNSOla 

Segments : 11 
Offset : 151 
1st Codon : 1 

ISSCLQQLS LLMWITQCFLPVFLAQPP SGQ 
ATCrCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGG 

Gene : NYNSOla 

Segment^ : 12 
Offset : 166 
1st Codon : 1 

QCFLPVFLfAQP PSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCCAACCCCCTAGCGGACAGAGAAGGGCTGCC 

Gene : NYNSOlb 

Segment^ : 1 
Offset : 1 
1st Codon : 1 

AAMLMAQEALA FLMAQGAMLAAQERRV PRA 
GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTCTGATGGCCCAAGGCGCTATGCTCGCCGCrrCAGGAAAGGAGAGTGC 

Gene : NYNSOlb 

Segments : 2 
Offset : 16 
1st Codon : 1 

QGAMLAAQERRVPRAAEVPGAQGQQGPRGR 
CAGGGAGCCATGCTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCTGAGGTCC 

Gene : NYNSOlb 

Segments : 3 
Offset : 31 

1st Codon : 1 

AEVPGAQGQQG PRGREEAPRGVRMAAR LQG 
GCCGAAGTGCCTGGCGCTCAGGGACAGCAAGGCCCTAGGGGAAGGGAAGAGGCTCCCAGAGGCGTCAGGATGGCCG 

Gene : NYNSOlb 

Segment^ : 4 
Offset : 46 
1st Codon : 1 

EEAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGCAAGGCGCTGCC 

Gene : LAG El 

Segments : 1 
Offset : 1 
1st Codon : 1 

AAMQAEGQGTGGSTGDADGPGGPGI PDGPG 
GCCGCTATGCAAGCCGAAGC^CAAGGC^CAGGCGGAAGCACAGGCGATGCCGATGGCCCTG^ 

Gene : LAGE1 

Segments : 2 
Offset : 16 
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1st Codon : 1 

DADGPGGPGI PDGPGGNAGG PGEAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACCGGAGGCAGA 



Gene : LAGE1 

Segments : 3 
Offset : 31 

1st Codon ; 1 

GNAGGPGEAGATGGRG PRGAGAARASGPRG 
GGCAATGCCGGAGGCCCTGGCGAAGCCGGAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGCCGCTGCCAGAGCCT*CCGGCCCTAGGGGA 



Gene : LAGE1 

Segments : 4 
Offset : 4 6 
1st Codon : 1 

GPRGAGAARASGPRGGA PRG PHGGAASAQD 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCAGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGCTCAGGAT 



Gene : LAGEl 

Segments : S 
Offset : 61 
1st Codon : 1 

GAPRGPHGGAASAQDGR CPCGARRPDSRLL 
G^CGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGCCCAAGACGGAAGGTGTCCCTGTGGCGCTAGGAGACCCGATAGCAGACTGCTC 



Gene 


LAG El 


Segments 


6 


Offset 


76 


1st Codon 


1 


G R C P C G A 


Gene 


LAG El 


Segments 


1 


Offset 


91 


1st Codon 


1 s ' 


0 L H 1 


T M P 



S S P M 



FSS PMEAELVRR I LSRDAAPLPR 
CAGCTCCACATTACCATGCCCTTTAGCTCCCCCATGGAGGCTGAGCrCGTGAGAAGGATTCTGTCCAGGGATGCCGCTCCCCTCCCCAGA 



Gene : LAGE1 

Segments : 8 
Offset : 106 
1st Codon : 1 

LVRRI LSRDAAPLPR PGAVLKDFTVSGNLL 
CTC^TCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCTAGGCCTGGCGCTGTGCTCAAGGATTTCACAGTGTCCGGCAATCTGCTC 



Gene : LAGE1 

Segment S : 9 
Offset : 121 
1st Codon : 1 

PGAVLKDFTVSGNLLFI RLTAADHRQLQLS 
CCCGGAGCCGTCCTGAAAGACTTTACCGTCAGCG^AAACCTCCTGTTTATC^ 



Gene : LAGE1 

Segment S : 1 0 
Offset : 136 
1st Codon : 1 

FIR LTAADHRQLQLS I S SCLQQLSL LMWI T 
TTCATTAC^CTCACCGCTGCCXSATCACAGACAGCTCCAGCTCAGCATTAGCTCCT 



Gene : LAG El 

Segments : 11 
Offset : 151 
1st Codon : 1 

I SS CLQQLSLLMWITQC FLPV FLAQAP SGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCAATGCT^ 



Gene : LAGE1 

Segments : 12 
Offset : 166 
1st Codon : 1 

QCFLP VFLAQAPSGQRRAA 
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CAGTGTTTCCTCCCCGTCTTCCTCGCCCAAGCCCCTAGCGGACAGAGAAGGGCTGCC 
Segments in scrambled order: 



MAGE- 1 #15 

APEEEIWEELSVMEVYDGREHSAYGEPRKL 
G CCCCTG AGGAAGAGATTTGGG AAG AG CTCAGCX3TCATGGAAGTGTATG ACGGAAGGGAACACTCCGCCT ATGGCG AACCCAGAAAGCTC 

MAGE- 1 #4 

EEVPTAGSTDPPQSPQGASAFPTTINFTRQ 
GAGGAAGTGCCTACCGCrGGCTCCACCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCGCTTTCCCT 

PRAME #10 

TVWSGNRASLYSFPEPEAAQPMTKKRKVDG 
ACCGTCTGGTCCGGCAATAGGGCTAGCCTCTACTCCrc 

MAGE- 3 #14 

QIMPKAGLLIIVLAI IAREGDCAPEEKIWE 
C^GATTATGCCTAAGGCTGGCCTCCTGATTATCGTCCTGG 

PRAME #9 

LQVLDLRKNSHQDFWTVWSGKRASLYS FPE 
CTGCAAGTGCTOSACCrCAGGAAAAACTCCCACCAAGACTTTTGGACAGTGTGGAGCGGA 

PRAME #8 

LDVLLAQEVRPRRWKLQVLDLRKNSHQDFW 
CTGGATGTGCTCCTG<3CTCAGGAAGTGAGACCCAGAAGGTGGAAGCTCCAGGTCCTGGAT 

NYNSOlb #2 

QGAMLAAQERRVPRAAEVP GAQGQQGPRGR 
CAGGGAGCCATGCTGGCTGCCCAAGAGAGAAGGGTCCCCAGAGCCGCTGAGGTCCCCTCAGCCCAAGGCCAACAGGGACCCAGAGGCAGA 

PRAME #2 4 

QSPSVSQLSVLSLSGVMLTDVSPEPLQ ALL 
MAGE- 1 #17 

LTQDLVQEKYLEYRQVPDSDPARYEFLWG P 
CTGACAGAGGATCTGGTCCAGGAAAAGTATCTGGAATACAGACAGGTCCCCXjATAGCGATCCCGCTAGGTATC 

MAGE - 1 #6 

RQPSEGSSSREEEGPSTSCILESLFRAVI T 
AGGCAACCCTCCGAGGGAAGCTCCAGCAGAGAGGAAGAGGGACCCTCCACCTCCTGCATTCTGGAAAGCCTCTTCAGAGCCGTCATCACA 

BAGE #1 

AAMAARAVFLALS AQLLQARLMKEE SPVVS 
PRAME #34 

TFYDPEP I LCPCFMPNAA 
ACCTTTTACGATCCCGAACCCATTCTGTGTCCCTGTTTCATGCCCAATGCCGCT 

MAGE- 3 #12 

I ELMEVDPIGHLYI FATCLGLSYDG L. L G D N 
ATCGAACTGATGGAGGTCGACCCTATCGGACACCTCTACATTTTCGCTACCTGTCTGGGACT^ 

GAGE- 1 #2 

RRYVEPPEMIGPMRPEQFSDEVEPATPEEG 
AGGAGATACGTCGAGCCTCCCGAAATGATTGGCCCTATGAGACCCGAACAGTTTAGCGATGAGGTCGAGCCTGCCACACCCGAAGAGGGA 

TRP2IN2 #2 

EAGGFFPWLKVYYYRFVIGLRVWQWEVISC 
GAGGCTGGCGGATTCTTTCCCTGGCTGAAAGTGTATTACTATAGGTTTGTGATTGGCCTCAG 

PRAME 81 

AAMERRRLWGSIQSRYI SMSVWTSPRRLVE 
GCCGCTATGGAAAGGAGAAGGCTCTGGGGAAGCATTCAGTCCAGGTATATC 

TRP2IN2 #1 

AALMETHLSSKRYTEEAGGFFPWLKVYYYR 
GCCGCTCTGATGGAGACACACCTCAGCTCCAAGAGATACACAGAGGAAGCCGGAGGCTTTTTCCCTTGGCT 
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MAGE- 1 #1 

aamsleqrslhckpeealeaqqealgl'vcv 
gccgctatgtccctggaacagagaagcctccactgtaagcctgaggaagccctosaggctcagcaagaggctct 

MAGE- 1 #3 

QAATSSSS PLVLGTLEEVPTAGSTDPPQS P 
CAGGCTGCC^CAAGCTCCAGCTCCCCCCTCGTGCTCGGCACACTGGAAGAGGTCCCCACAGCCGGAAGCACA^ 

PRAME #4 

ALELLPRELF PPLFMAAFDGRHSQTLKAMV 
GCCCTCGAGCTCCTGCCTAGGGAACTGTTTCCCCCTCTGTTTATGGCTGCCTTTGACGGAAGGCATAGCCAAA 

MAGE- 3 #16 

ELSVLEVFEGREDS I LGDPKKLLTQHFVQE 
GAGCTCAGCGTCCTGGAAGTGTTTGAGGGAAGGGAAGACTCCATCCTCGGCGATCCCAAAAAGCTCCTGACACA 

MAGE-1 #11 

ESLQLVFGIDVKEAD PTGHSYVLVTCLGLS 
GAGTCCCTGCAACTGGTCTTCGGAATCGATGTGAAAGAGGCTGACCCTACCGGACACTGCTACGTCCTGGTCACCTGTC^ 

MAGE- 3 #S 

PD PPQS PQGASSLPT TMNYPLWSQSYEDS S 
CCCGATCCCCCTCAGTCCCCCCAAGGCGCTAGCTCCCTGCCTACCACAATGAATTACCCTCTGTGGA 

LAGE1 #1 

AAMQAEGQGTGGST G DADGPGGPGI PDGPG 
GCCGCTATGCAAGCCGAAGGCCAAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTC 

NYNSOla #12 

QCFLPVFLAQPPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGCCCAACCCCCTAGCGGACAGAGAAGGGCTGCC 

gplO0In4 #2 

TWGEGLPSQ\PI IHTCVYFFLPDHLSFGRPF 
ACCTGGGGCGAAGGCCTCCCCTCCCAGCCTATCATTCACAC^ 

MAGE-1 #7 

STSCI LESL.FRAVITKKVADIjVGFLLL.KYR 
AGCACAAGCTGTATCCTCGAGTCCCTGTTTAGGGCTGTGATTACCAAAAAGGTCGCCGATCTGGTCGGCTTTCTGCT 

NYNSOla #1 

AAMQAEGRGTGGSTGDADGPGGPGI PDGPG 
GCCGCTATGCAAGCCGAAGGCAGAGGCACAGGCGGAAGCACAGGCGATGCCGATGGCCCTGGCGGACCCGGAATCCCTGACGGACCCGGA 

GAGE- 1 #7 

DGPDGQEMDPPNPEEVKTPEEEMRSHYVAQ 
GACGGACCCGATGGCCAAGAGATWACCCTCCCAATCCCGAAGAGGTCAAGACACCCGAAGAGGAAATGAGA^ 

NYNSOla #11 

ISSCLQQ.LSLLMWITQCFLPVFLAQPPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCTCCTGATGTGGATTACCCAATC 

PRAME #26 

ERASATLODLVFDECGITDDQLLALLPSLS 
GAGAGAGCCTCCGCCACACTGCAAGACCTCGTGTTTGACGAATGCGGAATCACAGACGATCAGCTCCTGGCTCTGCTCCCCT 

MAGE -3 #17 

LGDPKKLLTQHFVQENYLEYRQVPGSDPAC 
CTGGGAGACCCTAAGAAACTGCTCACCCAACACTTTGTGCAAGAGAATTACCTCGAGTATAGGCAAGTGC 

MAGE-1 #2 

EALEAQQEALGLVCVQAATSSSSPLVLGTL 
GAGGCTCTGGAAGCCCAACAGGAAGCCCTCGGCCrCGTGTGTGTGCAAGCCGCTAC 

NYNSOla #7 

EFYLAMPFATPMEAELARRSLAQDAPPLPV 
GAGTTTTACCTCGCCATGCCCTTTGCCACACCCATGGAGGCTGAGCTCGCCAGAAGGTGCCTGGCTCAGGATGCCCCTCCCCTCCCCGTC 

NYNSOlb #4 

EEAPRGVRMAARLQGAA 
GAGGAAGCCCCTAGGGGAGTGAGAATGGCTGCCAGACTGCAAGGCGCTGCC 
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BAGE #3 

WRLEPEDGTALCFI FAA 
TGGAGACTGGAACCCGAAGACGGAACCGCTCTGTGTTTCATTTTCGCTGCC 

GAGE - 1 #3 

EQFSDEVEPATPEEGEPATQRQDPAAAQEG 
GAGCAATTCTCCGACGAAGTGGAACCCGCTACCCCTGAGGAAGGCGAACCCGCTACCCAAAGGCAAGACCCTGCCGCTGCCCAAGAGGGA 

MAGE- 3 #6 

TMNYPLWSQSYEDSSNQEEEGPSTFPDLES 
ACCATGAACTATCCCCTCTGGTCCCAGTCCTACGAAGACTCCAGCAATCAGGAAGAGGAAGGCCCTAGCACAT^ 

MAGE -3 #7 

NQEEEG P STFPDLESE FQAALSRKVAELVH 
AACCAAGAGG AAGAGGG ACCCTCCACCTTTCCCGATCTGG AAAG CG AATTCCAAGCCG CTCTGTCCAGG AAAGTGGCTG AGCTCGTGCAT 

PRAME 813 

VDLFLK egacdelfsyli ekvkrkknvlrl 

GTGGATCTGTTTCTGAAAGAGGGAGCCTGTGACGAACTGTTTAGCTATCTGATTC 
rTYNSOla #10 

TI RLTAADHRQLQLS I SSCLQQLSLLMW IT 
ACCATTAGGCTCACCG CTG C CGATCACAGACAGCTCCAGCTCAG CATTAGCTCCTGCCTCCAGCAACTGTCCCTGCTCATGTGG ATCACA 

MAGE- 3 #1 

AAMPLEQRSQHCKPEEGLEARGEALGLVGA 
GCCGCTATGCCTCTGGAACAGAGAAGCCAACACTGTAAGCCTGAGGAAGGCCTCGAGGCT 

NYNSOla #2 

DADGPGG PG I PDGPGGNAGGPGEAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCTGGCGGACCCGGAGAGGCTGGCGCTACC 

MAGE- 3 #19 

YEFLWG PRALVETSYVKVLHHMVKI SGG PH 
TACGAATTCCTCTGGGGACCCAGAGCCCTCGTGGAAACCTCCTACGTCAAGGTCCTGCATCACATGGTGAAAATCTCCGGCG 

PRAME #23 

ITNCRLS EGDVMHLSQSPSVSQLSVLSLSG 
ATCACAAACTGTAGGCTCAGCGAJ^GGCG ATGTGATGCACCTC AG CCAAAGCCCTAGCG 

MAGE- 3 #18 

NYLEYRQVPGSDPACYEFLWGPRALVETSY 
AACTATCTX3GAATACAGACAGGTCCCCGGAAGCGATCCCGCTTGCTATGAGTTTCTGTGGGGCCCTAGGGCTCTC 

MAGE- 3 #11 

VI FSKASSSLQLVFGIELMEVDP I G H L» Y I F 
GTGATTTTCTCCAAGG CTAG CTCC AGC CTCCAGCTCGTGTTTGGCATTGAGCTCATGG AAGTGGATCCCATTGGCCATCTGTATATCTTT 

PRAME #21 

QALYVDS LFFLRGRLDQLLRHVMNPLETLS 
CAGGCTCTGTATGTGGATAGCCTCTTCTTTCTGAGAGGCAGACTGGATCAGCTCCT 

PRAME #20 

YIAQFTSQFLSLQCLQALYVDSLFFLRGRL 
TACATTGCCCAATTCACAAGCCAATTCCTCAGCCTCCAGTGTCTC 

PRAME #7 

GQHLHLETFKAVLDGLDVLLAQEVRPRRWK 
GGCCAACACCTCCACCTCGAGACATTCAAAGCCGTCCTGGATGGCCTCGACGTCCTGCTCGCCCAAGAGGTCAGGCCTAGGAGATGGAAA 

LAGE1 #10 

FI RLTAADHRQLQLSISSCLQQLSLLMWIT 
TTCATTAGGCTCACCGCTGCCGATCACAGACAGCTCCAGCTCAGCATTAGCTCCTGCCTCCAGCAACTGTCCCT^ 

PRAME #15 

CCKKLKI FAMPMQDI KMILKMVQLDSI EDL 
TGCTGTAAGAAACTGAAAATCTTTGCCATGCCCATGCAGGATATCAAAATGATTCTGAAAAT 

NYNSOla #5 

GAPRGPHGGAASGLNGCCRCGARGPESRLL 
GGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGGCCTCAACGGATGCTGTAGGTGTG^ 
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MAGE- 1 #8 

KKVADLVGFLLLKYRARE PVTKAEMLESVI 
AAGAAAGTGGCTGACCTCGTGGGATTCCTCCTGCTCAAGTATAGGGCTAGGGAACC^ 

MAGE-1 #13 

YDGLLGDNQIMPKTGFLI IVLVMIAMEGGH 
TACGATGGCCTCCTGGGAGACAATCAGATTATGCCTAAGACAGGCTTTCTGATTATCGTCCTGGTCATGATTGCCA 

PRAME #2 9 

SISALQSLLQHLIGLSNLTHVLYPVPLESY 
AGCATTAGCGCTCTGCAAAGCCTCCTGCAACACCTCATCGGACTGTCCAACCTCAC^ 

MAGE -3 #1S 

IAREGDCA PEEKI WEELSVLEVFEGREDS I 
ATCGCTAGGGAAGGCGATTGCGCTCCCGAAGAGAAAATCTGGGAGGAACTGTCCG 

PRAME #22 

DQLLRHVMNPLETLS I TNCRLSEGDVM HLS 
GACCAACTGCTCAGGCATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAATTGCA^ 

MAGE-1 #19 

RALAETSYVKVLEYVIKVSARVRFFFPSLR 
AGGGCTCTGGCTGAGACAAGCTATGTGAAAGTGCTCGAGTATGTGATTAAGGTC 

PRAME #30 

SNLTHVLYPVPLESYEDI HGTLHLERLAY L 
AGCAATCTGACACACGTCCTGTATCCCXJTCCCCCTCGAGTCCTACGAAG 

NYNSOlb #1 

AAMLMAQEALAFLMAQGAMLAAQ ERRVPRA 
GCCGCTATGCTCATGGCTCAGGAAGCCCTCGCCTTTCTGATGGC^ 

MAGE-1 #10 i 

KNY KHCFPEI FGKASESLQLVFGID VKEAD 
AAGAATTACAAACACTGTTTCCCTGAGATTTTCGGAAAGGCTAGCGAAAGCCTCCAGCTCGTGTTTGGCATTGACG 

y ' I 

MAGE- 3 #4 

TLVEVTLGEVPAAESPDP PQS PQGASSLPT 
ACCCTCGTGGAAGT^ACACTGGGAGAGGTCCCCGCTGCCGAAAGCCCTGACCCTCCGCAAAGCCCTCAGGGAGCCTCCAGCCTCCCCACA 

PRAME #32 

HARLRELLCELGRPSMVWLSANPCPHCGDR 
CACGCTAGGCTCAGGGAACTGCTCTGCGAACTGGGAAGGCCTAGCATGGTGTGGCTGTCCGCCAATCCCT 

PRAME #25 

VMLTDVSPEPLQALLERASATLQDLVFDEC 
GTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCAAGCCCTCCTGGAAAGGGCTAGCGCTACC^ 

GAGE - 1 #5 

EDEGASA GQGPKP EADSQEQGH PQTGCECE 
GAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGCCGATAGCCAAGAGCAAGGCCATCCCCAAACGGGATCCGAATGCGAA 

MAGE- 3 #10 

EMLGSVVGNWQYFFPVIFSKASSSLQLVFG 
GAGATGCTGGGAAGCGTCGTGGGAAACTGGCAGTATTTCTTTCCCGTCATCTTTAGCAAAGCCTCCAGCTCrc 

GAGE- 1 #1 

AAMSWRGRSTY RPRP RRYVE PPEMI GPMRP 
GCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAAGGTATGTGGAACCCCCTGAGATGATCGGACCCATGAGGCCT 

PRAME #2 

YI SMSVWTSPRRLVELAGQSLLKDEALAIA 
TACATTAGCATGAGCGTCTGGACAAGCCCTAGGAGACTGGTCGAGCTCGCCGGACAGTC 

MAGE-1 #16 

YDGREHSAYGEPRKLLTQDLVQEKY LEYRQ 
TACGATGGCAGAGAGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACGCAAGACCTCGTGCAAGAGAAATACCTCGAGTATAGGCAA 

LAGE1 #12 

QCFLPVFLAQAPSGQRRAA 
CAGTGTTTCCTCCCCGTCTTCCTCGGCCAAGGCCCTAGCGGACAGAGAAGGGCTGCC 
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MAGE -3 #20 

VKVLHHMVKI S G G PHI SYPPLHEWVLREGE' 
GTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTC 

LAGE1 #7 

QLHITMPFSSPMEAELVRRILSRDAAPLPR 
CAGCTCCAC ATTACCATGCCCTTTAGCTCC CCCATGGAGGCTGAG CTCGTG AG AAGG ATT CTGTCCAGGGATGCCGCTCCCCTCCCCAG A 

NYNSOla #9 

PGVLLKEFTVSGNILTIRL.TAADHRQLQLS 
CCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACAATCAGACTGACAGCCGCTGA 

PRAME #16 

KMI LKMVQLDS IEDLEVTCTWKLPTLAKFS 
AAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAGGATCTGGAAGTGACATGC^ 

MAGE- 1 #14 

FLI IVLVMIAMEGGHAPEEEIWEELSVMEV 
TTCCTCATCATTGTGCTCGTGATGATCGCTATGGAAGGCGGACACGCTCCCGAAGAGGAAATCTGG^ 

PRAME #17 

EVTCTWKLPTLAKFSPYLGQMINLRRLLLS 
GAGGTCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTAGCCCTTACCTCGGCCAAATC 

MAGE- 3 #2 

EGLEARGEALGLVGAQAPATEEQEAAS SSS 
GAGGGACTGGAAGCCAGAGGCGAAGCCCTCGGCCTCGTGGGAGCCCAAGCCCCTGCCACAGAGGAACAGGAAGCCGCTAGCTCCAG 

MAGE- 3 #21 

ISYPPLHEWVLREGEEAA 
AT CTCCTACCCTCCCCTC CACG AATGGGTCCTGAGAG AGGGAGAGGAAGCCG CT 

PRAME #19 

HIHASSYISPEKEEQYIAQFTSQFLSLQCL 
CACATTCACGCTAGCTCCTACATTAGCCCTGAGAAAGAGGAACAGTATATCGCTCAGTTTACCTCCCAGTTTCT 



GNAGGPGEAGATGGRGPRGAGAARASG PGG 
GGCAATGCCGGAGGCCCTGGCGAAGCCGGAGCCACAGGCGGAAGGGGACCCAGAGGCGCTGG CG CTG C C AG AG CCTCCGGCCCTGGCGG A 

NYNSOla #4 

GPRGAGAARASGPGGGAP RG PHGGAAS G L N 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGGACTGAAT 

MAGE- 1 #5 

QGASAFPTTINFTRQRQPSEGSSSREEEGP 
CAGGGAGCCTCCGCCTTTCCCACAACCATTAACTTTACCAGACAGAGACAGCCTAGCXiAAGGCT 

NYNSOla #8 

LARRSLAQDAPPLPVPGVLLKEFTVSGNIL 
CTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCCTCTGCCTGTGCCTGGCGTCCTGCTCAAGGAATTCACAGTGT 

PRAME #5 

AAFDGRH SQTLKAMVQAW PFTCL»P L G V LMK 
GCCGCTTTCGATGGCAGACACTCCCAGACACTGAAAGCCATGGTGCAAGCCTGGCCCTTTAC 

MAGE- 1 #20 

I KV SARVR F FF PS LREAA LR EEE EGVAA 
ATCAAAGTGTCCGCCAGAGTGAGATTCTTTTrCCCTAGCCTCAGGGAAGCCGCTCTGAGAGAGGAAGAGGAAGGCGTCGCCGCT 

PRAME #27 

GITDDQLLALLPSLSHCSQLTTLSFYGNSI 
GGCATTACCGATGACCAACTGCTCGCCCTCCTCCCTAGCCTCAGCCATTGCTC^ 

GAGE- 1 #8 

VKT PEEEMRSHYVAQTGI LWLLMNNCFLNL 
GTGAAAACCCCTGAGGAAGAGATGAGGTCCCACTATGTGGCTCAGACAGGCATTCTGTGGCTGCTCATGAATAACTGTT^ 

LAGE1 #11 

1 SSCLQQLSLLMWITQCFLPVFLAQAPSGQ 
ATCTCCAGCTGTCTGCAACAGCTCAGCCrrCCTGATGTGGATTACCCAATGCTTTCTGCCT 



NYNSOla #3 



Figure 27 (Cont) 
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PRAME #14 

YLI EKVKRKKNVLRLCCKKLKI FAMPMQD I 
TACCTCATCGAAAAGGTCAAGAGAAAGAAAAACGTCCTGAGACTGTGTTGCAAAAAGCTCAAGATTC 

MAGE- 1 8 9 

AREPVTKAEMLESVI KNYKHCFPEI FGKAS 
GCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGAAAGCGTCATCAAAAACTATAAGCATTGCTTTCCCGAA^ 

LAGE1 #8 

LVRRILSRDAA PL PR PGAV LKDFTVSGN LL 
CTGGTCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCTAGGCCTGGCGCTGTGCTCAAGGATTC 

PRAME #2 8 

HCSQLTTLSFYGNSI SI SALQSLLQHLI GL 
CACTGTAGCCAACTGACAACCCTCAGCTTTTACGGAAACTCCATCTCCATCTCCGCCCTC 

PRAME #33 

MVWLSANPCPHCGDRTFYDPEPI LCPC FMP 
ATGGTCTGGCTCAGCGCTAACCC^GCCCTCACTGTGGCGATAGGACATTCTATGACCCTGAGCCTATCCTCTGCCCTTGCTTTATGCCT 

gplOOIn4 #1 

AASWSQKRSFVYVWKTWGEGLPSQPI I HTC 
GCCGCTAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGAGAGGGACTGCCTAGCC^ 

BAGE #2 

LLQARLMKEES PVVSWRLEPEDGTALC FI F 
CTGCTCCAGGCTAGGCTCATGAAAGAGGAAAGCCCTGTGGTCAGCTGGAGGCTCGAGCCTGAGGATGGCACAGCCCr 

gplO0In4 #3 

VYFFLPDHLS FGR PFHLNFCDFLAA 
GTGTATTTCTTTCTGCCTGACCATCTGTCCTTCGGAAGGCCTTTCCAtCTC 

PRAME #18 . 

PYLGQMINL RRLLLSHIHASSYI SPEKCEQ 
CCCTATCTGGGACAGATGATCAATCTGAGAAGGCTCCTGCTCAGCCATATCCATGCCTCCAGCTATATCTCCCCCGAAAAGG 

^ ' I 

MAGE -3 #3 

QAPATEEQEAASS SSTLVEVTLG EVPAAES 
CAGGCTCCCGCTACCGAAGAGCAAGAGGCTGCCTCCAGCTCCAGCACACTGGTCGAGGTCACCCTCGGCGAAGTGGCTGCCGCTGAGTCC 

PRAME #6 

• QAWPFTCLPLGVLMKGQHLHLETFKAVLDG 
CAGGCTTGGCCTTTCACATGCCTCCCCCTCGGCGTCCTGATGAAGGGACAGCATCTGCATCTGGAAACCTTTAAGGCTGTGCTCGACGGA 

PRAME #12 

LSTEAEQPFI PVEVLVDLFLKBGACDELFS 
CTGTCCACCGAAGCCGAACAGCCTT^CATTCCCGTCGAGGTCCTGGTCGACCTCTTCCTCAAGGAAGGCGCITGCGAT^ 

NYNSOlb #3 

AEVPGAQ GQOG PRGREEAPRGVR MAAR L Q G 
GCCGAAGTGCCTGGCGCTCAGGGACAGCAAGGCCCTAGGGGAAGGGAAGAG^CTC 



LAG El #5 
GAP 



H G G A A S 



R C P C G A 



LAGE1 #4 

GPRGAGAARASGPRG GAPRGPHGGAASAQD 
GGCCCTAGGGGAGCCGGAGCCGCTAGGGCTAGGGGACCCAGAGGCGGAGCCCCTAGGGGACCCCATGGCGGAGCCGCTAGCGCTCAGGAT 

PRAME #3 

LAGQSLLKDEALA IAALELLPRE LF PPLFM 
CTGGCTGGCCAAAGCCTCCTGAAAGACGAAGCCCTCGCCATTGCCGCTCTGGAACTGCTCCCCAGAGAGCTCTTCCCTCCCCTCTTC 

GAGE-1 #4 

EPATQRQDPAAAQEGEDEGASAGQG PKPEA 
GAGCCTG CCACACAG AG ACAGG ATCCCGCTGCCGCTCAGG AAGGCG AAGACG AAGGCG CT AGCG CTGGCC AAGGCCCT AAGCCTGAGGCT 

PRAME #11 

PEAAQPMTKKR KVDGLSTEAEQP FI PVEVL 
CCCGAAGCCGCTCAGCCTATGACAAAGAAAAGGAAAGTGGATGGCCTCAGCACAGAGGCTGAGCAACCCTTTATCCCTGTGGAAGTGCTC 



Figure 27 (Cont) 
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LAGEl 86 

GRCPCGARRPDSRLLQLHITMPFSSPMEAE 
GGCAGATGCCCTTGCGGAGCCAGAAGGCCTGACTCQAGG 

LAGEl #9 

PGAVLKDFTVSGNLLFI RLTAADHRQLQLS 
CCCGGAGCCGTCCTGAAAGACTrTACCGTCAGCGGAAACCTCCTGTTTATCAGACTGACAGC^ 

PRAME #31 

EDIHGTLHLERLAYLHARLRELLCELGRPS 
GAGGATATCCATGGCACACTGCATCTGGAAAGGCTCG^CTATCTGC^ 

GAGE- 1 86 

DSQEQGHPQTGCECEDGPDGQEMDPPNPEE 
GACTCCCAGGAACAGGGACACCCTCAGACAGGCTGTGAGTGTGAGGATGGCCCTGACGGACAGGAAATGGATCCCCCT 

TRP2IN2 83 

FVIGLRVWQWEVI SCKL I KRATTRQPAA 
TT CGTC ATCGG ACTG AG AGTGTGGC AGTGGG AGGTCATCTCCTG CAAACTGATTAAGAG AGCCACAACCAGACAGCCTGCCGCT 

LAGEl 82 

OADG PGGPG I PDGPGGNAGG PG EAGATGGR 
GACGCTGACGGACCCGGAGGCCCTGGCATTCCCGATGGCCCTGGCGGAAACGCT 

MAGE- 1 812 

PTGHSYVLVTCLGLSYDGLLGDNQIMPKTG 
CCCACAGGCCATAGCTATGTGCTCX5TGACATGCCTCGGCCTCAGCTATGACGGACTC 

MAGE- 3 89 

FLLLKYRAREPVTKAEMLGSVVGNWQYFFP 
TTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAAATGCTCGGCTCCGTGGTCGGC 

GAGE- 1 89 

TG I LWLLMNNCFLNLSPRKPAA 
ACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTGTCCCCCAGAAAGCCTGCCGCT 

MAGE- 3 88 

EFQAALSRKVAELVHFLLLKYRAREPVTKA 
GAGTTTCAGGCTGCCCTCAGCAGAAAGGTCGCCGAACTGGTCCACTTTCTGCTCCTGAAATACAGAGCCAGAGAGCCTGTGACA 

MAGE- 1 818 

VPDSDPARYEFLWGPRALAETSYVKVLEYV 
GTGCCTGACTCCGACCCTGCCAGATACGAATTCCTCTGGGGACCCAGAGCCCTCGCCGAAACCTCCTACGTCAAGGTCCTGGAATACGTC 

NYNSOla 86 

GCCRCGARG PESRLLEFYLAMPFATPMEAE 
GGCTGTTGCAGATGCGGAGCCAGAGGCCCTGAGTCCAGGCTCCTGGAATTCTATCTG 

MAGE- 3 #13 

ATCLGLSYDGLLGDNQIMPKAGLLI IVLAI 
GCCACATGCCTCGGCCTCAGCTATGACGGACTGCTCGGCGATAACCAAATCATGCCCAA 

LAGEl #3 

GNAGG PGEAGATGG. RGPRGAGAARASGPRG 
GGC AATG C CGG AGG C C CTGG CG AAG C CGGAG CCACAGG CGG AAGGGGACC CAGAGG CG CTGG CG CTGCCAG AG CCTC CGGCCCT AGGGG A 

Artificial Protein: 



APEEE I W EELS VMEVYDGREHSAYGE PRKLEEVPTAGSTDP PQS PQGAS AFPTT I NFTRQTVWSGNRAS LYS FPE PEAAQPMTKKRKVDGQIMPKAGL 
LIIVIAIIAREGDCAPEEKIWEL^VLDLRKNSHQDFWTVWSGNRASLYSFPEI^VL^ 

EVPGACGGWPRGRQSPSVSQLSVLSLSGVMLTDVSPEPLQALLLTQDLVOEKYLEYRQVPDSDPARYEFLWGPRQPSEGSSSREEEGPSTSCILESL 
FRAVITAAMAARAVFLALSAQLLQARLMKEESPVVSTFYDPEPI LCPCFMPNAAI ELMEVDPIGHLYI FATCLGLSYDGLLGDNRRYVEPPEMIGPMR 
PEQFSDEVEPATPEEGEAGGFFPWLKVYYYRFVIGLRWQWEVISCAAMERRRLWGSIQSRYISMSVVTTSPRRLVEAAL^ 

WLKVYYYRAAMS LEQRS LH C K PE EALEAQQEALG LVCVQ AATSS S S PLVLGTLEEVPTAGSTDPPQSPALELLPRELFPPLFMAAFDGRHSQTLKAMV 
ELS VLE V F EG R EDS I LGD P KKLLTQH FVQE ES LQL VFG I DVKEADPTGHS YVLVTCLGLS PD P PQS PQG AS S LPTTMNY PLWSQS YEDS S AAMQAEGQ 
GTGG STGDADG PGG PG I PDG PGQC FLP VFLAQ P P SGQRRAATWG EG L P SQ P 1 1 HTCVY F F LPDHLS FGR P FSTS C I LES LFRAVI TKK VAD LVG FLLL 
KYRAAMQAEGRGTGGSTGDADGPGGPG1 PDGPGIXaPDGQEMDPPNPEEVKTPEEEMRSHYVAQISSCLQQLSLLMWITC^FLPVFXAQPPSGQERASA 
TLQDLVFDECGITDDQLLALLPSLSLGDPKKLLTQHFVQENYLEYRQVPGSDPAC 

AEIJiRRSIAQDAPPLPVEEAPRGVRMAAJlLQGAAWRIXPEIXSTALCFIFAAEQFSDEVEPATPEEGEPA 

EEEG PSTFPDLESNQEEEGPSTFPDLESEFQAALSRKVAELVHVDLFLKEGACDELFSYLI E KVKRKKNVLRLT I RLTAADHRQLQLS ISSCLQQLS L 
LMWITAAMPLEQRSQHCKPEEGLEARGEALGLVGADADGPGGPGI PIX3PGGNAGG PG EAGATGGR YEFLWGPRALVETSYVKVLHHMVKISGGPHITN 
CRLS EGDVMHLSOS PS VSQLSVLS LSGNYLEYRQVPGSDPACYEFLWGPRALVETSYVI FSKAS S S LQLVFG I ELMEVDPI GHLY I FQALYVDSLFFL 
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RGRI-DQLLRHVMNPLETLSYIAQFTSQFI^LCCI^ALYTO 
LQQL^LI^MWITCCKKLKIFAMPMQDIKMILKW^ 
SVIYDGLIXJDNQIMPKTGFLI IVTjVMI AMEGGHSISALQ^ 
HVMNPLETLSITNCRLSEGDVMHLSRAIAETSYVKV^^ 

MACGAMLAAQERRVPRAKNYKHCFPEI FGKASESLQLVFGIDVKEADTLVEVTLGEVPAAESPDPPQS PC^ASSLPTHARLRELLCEIjGRPSMVWLSA 
NPCPHCGDRVMLTDVSPEPLQALLERASATLQDLVFDECEDEGASAGC^PKPEADSQ 

G AAMSWRGRSTYR PRPRRYVE P PEMIG PMRPYI SMS VWTS PRRLVELAGQSLLKDEALAI AYDGR EHSA YGEPRKLLTQDLVQEKYLEYRQQCFLPVF 

LAQAPSGQRRAAVKVLHHMVK I SGG PH I SY PPLH EWVLREGEQLHI TMP FSS PMEAELVR R I LS RDAAPLPRPGVLLKEFTVSGNILTIRLTAADHRQ 

LQLSKMI LKMVQLDS I EDLEVTCTWKt.PT LAK FS FLI I VLVM I AMEGGHAPEEE I WEELS VMEVEVTCTWKLPTLAKFSPYLGQMINLRRLLLSEGLE 

ARGEALGLVGAQAPATEEQEAASSSSISYPPLHEVfVI^EGEEAAHIHASSYISPEKEE 

ASGPGGGPRGAGAARASGPGGGAPRGPHGGAASGLNC^ASAFPTTINPTRQRQPSEGSSSREEEGPIJUl^^ 

FDGRHSCTLKAJWQAWPFTCLPLGVI^KIKVSARVRFFFPSLREAAIJIEEE 

VAC/TGILWLIJ4N^FLJfLISSCLC^LSLL^ 

CFPEI FGKASLVRRI LSPJ>AAPLPRPGAVLKDFTVSGNLLHCSQLTTLSFYGNSI SI SAIXJSLLQHLIGLIWWLSANPCPHCGDRTFTOPEPILCPCF 
MPAASWSQKRSFVYVWKTWGE<3LPSQPI I HTCLLQARLMKEESPVVSWRLEPEDGTALCFI FVYFFLPDHLSFGRPFHLNFCDFLAAPYLGQMINLRR 
LLLSHIHASSYISPEKEEC^APATEEQEAASSSSTLVEVTLGEVPAAESQAWPFTCL^ PVEVLVDLF 
IJCEGACDELFSAEVPGAQGQQGPRGREEAPRGVRMAARLQGGAPRGPHGG 

AQDLAGQS LLKDEALAI AALELLPR ELF P PLFME PATQRQDPAAAQEGEDEGASAGQGPKPEA PEMQPWTKIO*KVDGLSTEAEQPFIPVEVLGRCPC 
GARRPDSRIjLQLHITMPFSSPMEAEPGAVLJCDFTVSGNLLFIRLTAADHRQLQLSEDIHGTLW 

CEDGPDGQEMDPPNPEEFVIGLRWQWEVISCKLIKRATTRQPAADADGPGGPGI PDGPGGNAGGPGEAGATGGRPTGHSYVLVTCLGLSYDGLLGDN 
QIMPKTGFLLLKYRARE PVTKAEMLGSWGNWQYFF PTG I LWLLMNNCFLNLS PRKPAAEFQAALSRKVAELVHFLLLKYRAREPVTKAVPDSDPARY 

EFLWG PRALAETS YVKVLEYVGCCRCGARG pesrllefylamp FATPMEAEATCLGLS ydgllgdnqimpkaglliivlaignaggpgeagatggrgp 
RGAGAARASG PRG 

Artificial DMA: 



gcccctgaggaagagatttgggaagagctcagcgtcatggaagtgtatg 

gcctaccgctgcctccaccgatccccctcagtccccccaaggcgctagcgcttt^ 

atagggctagcctctactccttccctgagcctgaggctgcccaacccatgaccaaa 

ctgattatcgtcctggctatcattgccagagagggagactgtgcccctgaggaaaagatttgggaa^ 

ccaagacttttggacagtgtggagcggaaacagagcctccctgtatagctttcccgaactgg^ 

ggaagctccaggtcctg^tctgagaaagaatagccatcaggatttctggcagggagccatgct 

gaggtccccggagcccaaggccaacagggacccagaggcagacagtccccctccgtgtcccagctcagcgtcc^ 

cgatgtgtcccccgaacccctccaggctct<xrrcctc^ 

ctaggtatcagtttctgtggggccctaggcaaccctccgaggga^ 

ttcagagccgtcatcacagccgctatggctgccagagccgtcttcctcgccctcagcgctcagctcctgcaag 
cgtcgtgtccaccttttacgatcccgaacccattcix5tgtccctgtttcatgccc 

tctacattttcxkttacctgtctgggactgtcctacgatggcctcctgggagacaataggagatacgtcgagcctcccgaaatgattgg 

cccgaacagtttagcgatgaggtcgagcctgccacacccgaagagggagaggctgg 

gattggcctcagggtctggcaatgggaagtgattagctgtgccgctatggaaaggag^ 

<xx;tgtggacctcccccagaaggctc<3tggaagccgctctgatggagacacacctcagctccaagagata 

tggctcaaggtctactattacagagccgctatgtccctggaacagagaag 

gggactggtctgcgtccaggctgccacaagctccagctcccccctcgtgctcggcacactgga^ 

aaagccctgccctcgagctcctgcctagggaactgtttccccctctg 

gagctcagcgtcctggaagtgtttgagggaagggaagactccatcctcggcgatcccaaaaagc^ 

gcaactggtcttcggaatcgatgtgaaagaggctgaccctaccggacactcctacgtcctggtcacctgtctgggactgtcccc 

ccccccaaggtcctagctccctgcctaccacaatgaattaccctctgtggagccaaagctatgaggatagctccgccgctatc 

ggcacaggcggaagcacaggcgatgccgatggccctggcggagccggaatccctgacggacccggacagtgtttcct 

ccctagcggacagagaagggctgccacctggggcgaaggcctcccctccca 

gctttggcagaccctttagcacaagctgtatcctggagtccctgtttagggctgtgatt 

aaatacagagccgctatccaagccgaaggcagaggcacaggcggaagcacaggcgatc 

agacggacccgatggccaagagatggaccctcccaatcccgaagaggtcaagacacccgaagaggaaatgagaagccattacgtcgcccaaatcrc 

gctgtctgcaacagctcagcctcctgatgtggattacccaatgctttctgcctgtgtt^ 

acactgcaagacctcgtgtttgacgaatgcggaatcacagacgatcagctcctggctctgctcccctccctgt 

cacccaacactttgtgcaagagaattacctcgagtataggcaagtgcctgg 

gcctcgtctgtctgcaagccgctacctccagctccagccctctggtcctggga^ 

gctgagctcgccagaagctccctggctcaggatgcccctcccctccccgtcgaggaagc 

tgcgtggagactxxiaacccgaagacggaaccgctctgtgtttcattttc^ 

gcgaacccgctacccaaaggcaagaccctcccgctgcccaagaggg 

•gaagaggaaggccctagcacattccctgacctcgagtccaaccaagaggaagaggg 

tctgtccaggaaagtggctgagctcgtgcatgtggatctgtttctgaaagaggga 

gg aaaaagaatgtgctcaggctcaccattaggctcaccgctgccgatcacagacagctccagctcagcatt 

ctcatgtggatcacagccgctatgcctctggaacagagaagccaacactgtaagcctgaggaaggcctc 

cggcgctgacgctgac<5gacccggaggccctggcattcccgatggccctggcggaaacgctgg 

acgaattcctctggggacccagagccctcgtggaaacctcctacgtcaaggtcctgcatcacatggtgaaaatctccggcggacc 

tgtaggctcagcgaaggcgatgtgatgcacctcagccaaagccctagcgtcagccaactgtccgtgctcagcctcagcggaaactatctc 

acaggtccccgg aagcgatcccg cttgct atg agtttctgtggggccctagggct ctggtcg ag ac aagctatg tg attttctccaaggct ag ctcca 

gcctccagctcgtgtttggcattgagctcatggaagtggatcccattggccatctgtatatctttcaggctctgtatgtggatagcctctt 

agagggagactggatcagctcctgagacacgtcatgaatcccctcgagacactgtcctacattgg 

tctgcaagccctctacgtcgactccctgtttttcctcaggggaaggcrrcggrc 

acgtcctgctcgcccaagaggtcaggcctaggagatggaaattcattaggctcaccgctgccgatcacagacagctccagctcagcattagctcctgc 
ctccagcaactgtcgctgctcatgtggatcacatgctgtaagaaactgaaaatctttgccatgcccatgcaggatatc 
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CCAGCTCGACTCCATCGAAGACCTCGGCGCrCCCAGAGGCCCTC^ 

CCGAAAGCAGACTGCTCAAGAAAGTGGCTGACCTCGTGGGATTCCTCCTGCTCAAGTATAGGGCTAGGGAACC 
TCCGTGATTTACGATGGCCTCCTGGGAGACAATCAGATTATGC 

TAGCATTAGCGCTCTGCAAAGCCTCCTGCAACACCTCATCGGACTGTCCAACCTCACCCATGTGCTCTACCCT 
GGGAAGGCGATTGCGCTCCCGAAGAGAAAATCTGGGAGGAACTGTCCGTGCTCGAGGTCTTCGAAGGCAGAGAGGATAGC 
CATGTGATGAACCCTCTGGAAACCCTCAGCATTACCAATTGCAGACTGTCCGAGGGAGACGTCATGCATC^ 
TGTGAAAGTGCTCGAGTATGTGATTAAGGTCAGCGCTAGGGTCAGGTTTTTCTTTCCCTCCCT 

CCCTCGAGTCCTACGAAGACATTCACGGAACCCTCCACCTCGAGAGACTGGCTTACCTCGCCX5CTATGCTCATGGCT 
ATGGCCCAAGGCGCTATGCTCGCCGCTCAGGAAAGGAGAGTGCCTAGGGCTAAGAATTACAAACACTGTTTCCCTGAGATTT^ 

CCCAAAGCCCTCAGGGAGCCTCCAGCCTCCCCACACACGCTAGGCTCAGGGAACTGCTCTGCGA^ 

AATCCCTGTCCCCATTGD^AGACAGAGTGATGCTGACAGACGTCAGCCCTGAGCCTCTGCAAGCC 

TCTGGTCTTCGATGAGTGTGAGGATGAGGGAGCCTCCGCCGGACAGGGACCCAAACCCGAAGCCGATA 

GCGAATGCGAAGAGATGCTGGGAAGCGTCGTGGGAAACTGGCAGTATTTCTTTCCCGTCATCT 

GGAGCCGCTATGTCCTGGAGAGGCAGAAGCACATACAGACCCAGACCCAGAAGGTA 

TAGCATG AGCG TCTGGACAAGCCCT AGG AGACTGGTCGAG CTCGCCGGACAGTCCCTG CTCAAGGATGAGGCTCTGGCTATCGCTT ACGATGGCAGAG 
AGCATAGCGCTTACGGAGAGCCTAGGAAACTGCTCACCCAAGACCTCGTGCAAGAGAAATACCTCGAGTATAGGCAACAGTGTTT 
CTCGCCCAAGCCCCTAGCGGACAGAGAAGGGCTGCCGTGAAAGTGCTCCACCATATGGTCAAGATTAGCGGAGGCCCTCAC^ 
GCATGAGTGGGTGCTCAGGGAAGGCGAACAGCTCCACATTACCATGC^ 

ATGCCGCTCCCCTCCCCAGACCCGGAGTGCTCCTGAAAGAGTTTACCGTCAGCGGAAACATTCTGACAATCAGACTGACAGCCG 

CTGCAACTGTCCAAGATGATCCTCAAGATGGTGCAACTGGATAGCATTGAGGAT 

CTCCTTCCTCATCATTGTGCTCGTGATGATCGCTA^^ 

TCACCTGTACCTGGAAGCTCCCCACACTGGCTAAGTTTAGCCCTTACCTCGGCCAAATGATTAACCT 

CGAATGGGTCCTGAGAGAGGGAGAGGAAGCrcCTCACATTCACGCTAGCTCCTAC^TTAGCCCTGAG 

GCCTCCGGCCCTGGCGGAGGCCCTAGGGGAGCCXSGAGCCGCTAGGGCTAGCGGACCCGGAGGCGGAGCCCCTAGG^ 

CGGACTGAATCAGGGAGCCTCCGCCTTTCCCACAACCATTAACTTTACCAGACAGAGACAGCCTAGCGAAGGCT 

CTCTGGCTAGGAGAAGCCTCGCCCAAGACGCTCCCCCTCTGCCTG^ 

TTCGATGG(ZAGACACTCCCAGACACTGAAAGCCATC<3TC 

CAGAGTGAGATTCTTTTTCCCTAGCCTCAGGGAAGCCGCT^ 

TCCTGCCTAGCCrCAGCCATTGCTCCCAGCTCACCACACTCTC 

GTGGCTCAGACAGGCATTCTGTGGCTGCTCATGAATAACTGTTTCCT 

CCAATGCTTTCTGCCTGTGTTTCTGGCTCAGGCTCCCT 

AAAAGCTCAAGATTTTCGCTATGCCTATGCAAGACATTGCCAGAGAGCCTGTGACAAAGGCTGAGATGCTGGAAAGCGTC 
TGCTTTCCCGAAATCTTTGGCAAAGCCTCCCTGGTCAGGAGAATCCTCAGCAGAGACGCTGCCCCTCTGCCT 
CACAGTGTCCGGCAATCTGCTCCACTGTAGCCAACTGACAACCCTCAGCTTTTACGGAAACTCCATCT 
ATCTGATTGGCCTCATG^TCTGGCTCAGCGCTAA 

ATGCCTGCCGCTAGCTGGAGCCAAAAGAGAAGCTTTGTGTATGTGTGGAAGACATGGGGAGAGGGA 

GCTC CAGGCTAGGCT CATG AAAGAGGAAAGCCCTGTGGTCAGCTGGAGGCTCG AGCCTG AGGATGGCACAG CCCTCTIt3CTTTATCrrTTGTGTATTTCT 
TTCTGCCTGACCATCTGTCCTTCGGAAGGCCTTTCCATCTC 

CTCCTGCTCAGCCATATCCATGCCTCCAGCTATATCTCCCCCGAAAAGGAAGAGCAACAGGCTCCCGCTACCGAAGAGCAAGAG^ 

CAGCACACTGGTCGAGGTCACCCTCGGCGAAGTGCCTGCCGCTGAGTCCCAGGCTTGGCCTTTCACATGCCTCCCCCT 

AGCATCTGCATCTGGAAACCTTTAAGGCTGTGCTCGACGGACTGTCCACCGAAGCCGAACAG 

CTCAAGGAAGGCGCTTGCGATGAGCTCTTCTCCX5CCGAAGTGCCTGGCGCTCAGGGACAGCAAGGCCCT 

CAGGATGGCCGCTAGGCTCCAGGGAGGCGCTCCCAGAGGCCCTCACGGAGGCGCTGCCTCCGCCCAAGAC 

CCGATAGCAGACTGCTCGGCCCTAGGGGAGCCGGAGCCGCTAG^CTAG<^ 

GCTCAGGATCTGGCTGGCCAAAGCCTCCTGAAAGACGAAGCCCTCGCCATTGCCGCTCTGGAACTGCTCCCCAGA 

GG AG CCTGCC ACAC AG AGACAGGATCCCGCTGCCGCTCAGG AAGGCGAAGACGAAGGCGCT AGCGCTGGCC AAGGCCCTAAG CCTGA 

C CG CTC AG CC T ATG AC AAAG AAAAGGAAAG TGG ATGGC CT CAG CA C AG AGGCTG AG CAACCCTTT AT CCCTGTGG AAG TG CT CGGCAG ATG CCCTTG C 

GGAGCCAGAAGGCCTGACTCCAGGCTCCTGCAACTGCATATCACAATGCCTTTCTCC^ 

TACCGTCAGCGGAAACCTCCTGTTTATCAGACTGACAGCCGCTGACCATAGGCAACT 

GGCTCX3 CCT ATCTGCATGCCAG ACTG AG AG AGCTCCIX3TG TGAGCTCGGCAGACCCT 

TGTGAGGATGGCCCTGACGGACAGGAAATGGATCCCCCTAACCCTGAGGAATTCGTCATCGGACTGAGAGTGTG 
ACTGATTAAGAGAGCCACAACCAGACAGCCTGCCGCTGACGCTGACGGACCCGGAGGCCCTGGCATTCCrc 

CCGGAGAGGCTGGCGCTACCGGAGGCAGACCCACAGGCCATAGCTATGTGCTCGTGACATGCCTCGGCCTCAGCTATGACGGACTGCTCGG 

CAAATCATGCCCAAAACCGGATTCCTCCTGCTCAAGTATAGGGCTAGGGAACCCGTCACCAAAGCCGAA^ 

ATACTTTTTCCCTACCGGAATCCTCTGGCTCCTGATGAACAATTGCTTTCTGAATCTGTCCCCCAGAAAGCCTGCC 

GCAGAAAGGTCGCCGAACTGGTCCACTTTCTGCTCCTGAAATACAGAGCCAG AG AGCCTG TGACAAAGGCTGTG 

GAATTCCTCTGGGGACCCAGAGCCCrCGCCGAAACCTCCTACGTCAAGGTCCTGGAATACGTCGGC^ 

CAGGCTCCTGGAATTCTATCTGGCTATGCCTTTCGCTACCCCTATGGAAGCCG 

ACCAAATCATGCCCAAAGCCGGACTGCTCATCATTGTGCTCGCCATTGGCA^ 

AGAGGCGCTGGCGCTGCCAGAGCCTCCGGCCCTAGGGGA 
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Cassettes for construction of a full-length HIV Savine 

Cassette Al 

ggatccaccATGACAGGCCCTTGCACAAACGTCAGCACCGTGCAATGCACACACGGAATCAGACCCGTCGTGTCCA 

CCCAACTGCTCCTGAATGGCTCCCTGAGAAGCCTCTACAATACCGTCGCCACACTGTGGTGCGTCCACCAAAGGAT 

TGACGTCAGGGACACAAAGGAAGCCCTCGACAAAATCGAACTCGGCGATGGCGGAGGCGCTGAAAGGCAAGGCACC 

TCCAGCTCCTTCAACTTTCCACAAATCACAtTGTGGCAAAGGCCTCTGGTCACCGAACCCTTCAGAAAAAAGAATC 

CCGATATGGTGATTTACCAGTACATGGACGATCTGTATGTGGGAAGCGATCTGGAAATCGGACAGCATTTTACCAC 

ACCCGATAAGAAACACCAAAAGGAACCACCATTCCTCTGGATGGGATACGAACTGCATCCCGATAGGTGGACCGTC 

CAGCCTCTTAATTTCCCTCAGATTACCCTCTGGCAGCGTCCCCTCGTGACAATCAAAATCGGCGGACAGCTCATAG 

AGGCTCTGCTCGACACAGGCTCCTATGGCAGAAAGAAACGTAGGCAACGTAGACGCGCTCCTCAGAGCAGCAAGGA 

TCACCAATACCCTATCTCTGAGCAACCCCTCTCCTTCTTTAGGGAAAACCTGGCTTTCCAGCAAGGTAAAGCCAGA 

G AGTTT T CC AG CG AAC AG AC AAG AGCC AATAG CTCCG CCTC CAGG AAG AG CCC C CAAATCTCCGG CGAAAG CTCCG 

TCATTCTGGGATCTGGCACCAAAAACGCCGCTACTAGAAGAATCGAAGTGAAAGATACCAAAGAGGCTTTGGATAA 

GATTGAGGAGGTGCAAAAGAAAAGCGAGCAAAAGACACAACAGGCTGCCGCTAAAGCCGGATACGTCACCGATAGG 

GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACCAATCAGAAAACCGAACTGCATGCCATTCAAGAAGCCACTA 

CCACACTGTTTTGCGCCAGCGATGCCAAAGCCTATGAGACAGAGGTCCACAATGTGTGGGCCACACACGCTTGCGT 

CCCCGCTGACGATACAGTGCTGGAGGAGATGAACCTCCCCGGAAAATGGAAGCCTAAGATGATTGGCGGAATCGGC 

GGATTCATTAAGGTGAGAAAAATCGGACCCGAAAACCCTTACAATACCCCAATCTTCGCTATCAAGAAAAAGGACT 

CCACCAAATGGAGAAAGCTCGTGGATTTCAGAGTTAGGATTATCAATATCCTCTACCAAAGCAATCCCTATCCTAG 

CTCCGAAGGCTCCAGGCAAACCAGAAAGAATAGGAGAAGGAGATGGGGAGGCGAACGGGGTAGGGATAGGTCCGTG 

AGACTGGTCAACGGATTCTTAGCCCTCGCCTGGGACGATCTGAGAAACCTCTGCCTCTTCGAAAACCTCTGGGTCA 

CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGCTGCCACAACCCTCTTCTGTGCCTCCGACGCTAAGGCTTACGC 

TGCCATGGCTGGCAGAAGCGGCGGCACAGACGAAGAGCTCCTGAGGGCTATCAGAATCATTAACATTCTGTATCAG 

TCC AACCCTT A C C CTTCCGCT AGT ATG AG AAT CAG AACCTG G AAC AG C CTGGTC AAG C ATC AC ATG CAC ATCT C C A 

AGAAAGCCAAAGGCTGGTTCTATAGGCATCACTTTGAGGAGTCCGAGCTCGTGAATCAGATTATCGAAAAGCTCAT 

CAAAAAGGAAAAGGTCTACCTATCATGGGTACCAGCCCACAAGGGAATCGGACAAACCAAAGAGCTCCAGAAACAG 

ATTATCAAAATCCAAAACTTTAGGGTCTACTATAGGGATAGCAGAGACCCTATCTGGAAGGGACCCAAAAGCTTTG 

AGGAAATCTGGAACAATATGACATGGATTGAGTGGGAGAGAGAGATTAGCAATTACACAAGCCAAATCTATAAGAT 

TCTGAAACCCGAACCCACAGCCCCTCCCGCTGAGAATTTCAGATTCGGTGAGGAAACTACACCCTCCCAAAAGCAA- 

GAGCAAAAGGATAAGGAGCAATACGATCAGATTCTTATTGAGATTTGCGGCAAGAAAGCTATTGGTACGGTGCTCG 

TGGGACCTACCCCTGTGAATATCATTGGCAGAATTTACGAAACCTATGGCGATACCTGGGAGGGCGTCGAGGCTCT 

G ATCAG AATC CTC CAG C AACTG ATGTTT ATCC ATTTC AG AATCGG ATGTT TTC ATTG CC AAGTGTG TTTTCTC AC C 

AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACAGAGAAGGGGAGCTCCCCAAGCTGCCATGGACCCCG 

TGGAC CCC AAG C TGG AG CCTTGG AAAC ACC CTGGCTCCC AG C CTAAG AC AG CCTGTT AC AAATGCT ATTGC AAAAA 

GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAGACAAAGAACTCTACCCCCCTTTAGCCAGC 

CTCAAG T CCCTGTTTGG CAATG AC AATTTCAATATG TGG AAG AATG AC ATGGTGG AACAGATG C AAG AAG ACATTA 

TC TT AC T ATGGG AC C AAAG CCTC AAGCCTTG CGTC AAGCTCG ACGTCGG CG ATG C CT ATTTCTCCG TG CCTCTGG A 

TAAAAACTTCAG AAAG TATA CCGCTTTC ACAATCC CTAGC AC AAACAATG AG CAACTGAAAGGCGAAG CCATCCAT 

GGCCAAGTGAATTGCTCACCAGGCATTTGGCAACTGGATTGCACACACCTGGAGGGAAAGATTATCCCTAAGGTCA 

AGCAATGGCCTCTGACAGAGGAAAAGATTAAGGCTCTGACTGAGATTTGCAAAGAGATGGAGGAAGAGGGAAAGAT 

T AGC ATGG ATG ACCTC T ACGTCGGCTCCG AC C TG G 
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AGATTGGCCAACATAGGACCAAAATCGAAGAGCTCAGGGAACACCTCCTGAAATGGGGACTCACGGAAACCACAAA 

CCAAAAGACTGAGCTCCAAGCTATCCATCTGGCTCTGCAAGACTCCGGCTTAGAGGTCAACATTGTGACAGACATT 

CCCGCTGAGACTGGTCAAGAGACCGCCTTTTTCATTCTGAAACTGGCTGGCAGATGGCCTGTGAAAGTCATTCACA 

CAGACAATGGCAGGACAAAGATTGAGGAACTGAGACCGCATCTGCTCAAATGGGGCTTCACAACCCCTGACAAAAA 

GCATCAGAAAGAGCCTCCCTTTCTGTCTAGTGTCAAGAAACTGACAGAGGATAAGTGGAACGAACCCCAGAAAATC 

AAGAGACGCAGAGAAAATCACACAATGAATGGCCATACTGCCACAGAGTCCCAGAATCAGCAAGACAGAAACGAAA 

AGGAACTGCTGGAGCTCGACAAATGGGCAAGCCTCTGGAATTGGTTTAACATTACCGACACCGGAAATAGCTCCAA 

AGTGTCCCAGAATTACCCTATCGTCCAGAATGTCCAAGGCCAAATGGTCCACCAACCCCTCTCCCCCAGACTCATC 

GGACTGAGAATCGTTTTCGCTGTGCTCAGCATTATCAATAGGGTCAGGCAAGGCTATAGCCCTCTGTCCTTCCAAA 

CCCTCCCCCTCATCCATCTGCAATACTTTGACTGTTTCGCTGACTCCACCATTAGGAGAGCCATCTTGGGACACAT 

AGTGAGAAGGAGATGCGAATACGCTGTGGGACTCGGAGCCATGTTCCTTGGCTTTCTGGGTGCCGCTGGCTCCACC 

ATGGGCGCTGCCTCCATGACACTGACAGTGCAAGCCTATGACCCTAGCAAAGACCTCATTGCTGAGATTCAGAAAC 

AGGGCCAGGGTCAGTGGACATTTCAGATTTTCCAAGAGCCTTTCAAAAACGGAACCGTCCTGGTCGGCCCTACACC 

CGTCAACATCATCGGAAGGAACATGCTGACACAGCTTGGCCGCACTCTCAACTTTCCCATTAGCAAAGGCAGCCCT 

GCTATCTTTCAGTCCAGCATGCCACAGATTCTGGAGCCTTTTAGGATAAAAAACCCTGAGATGGTCATCTATCAGT 

ATCCTAGCCCTCTGACATTCGGATGGTGTTTCAAACTGGTCCCCGTGGACCCCAGCGAAGTGGAAGAGATCAACAA 

GGGCGAAAACAATTGCCCCCTGTTTAGGAAATACACAGCCTTTACCATTCCCTCCATCAATAACGAAACCCCTGGC 

ATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAGCACAATGGGAGCCGCCAGCATGACCCTCACCGTCC 

AGGCTAGGCTACTGCTCAGCGGAATCGTCCAGCAACAGAGCAATCTGCTGGAGGAGAATAGGGAAATCCTCAGAGA 

GCCTGTGCATGGCGTCTACTACGATCCCTCCAAGGATCTGGTCGCTGAAATCCAAAAGCAAGGCAGAGAGGAACTG 

TCCACCATGGTGGATATGGGAAACTACGACCTCGGAGTGGACAATAACCTCGCCGCTATTAGAATCCTGCAACAGC 

TCATGTTCATTCACTTTAGGATTGGCTGCCAGCACTCCAGGATTGGGATCATCCGTCAGAGAAGGGCCAGAGCTCC 

CAGGAAAAAGGGATGCTGGAAGTGTGGCAGAGAGGGACACCAGATGAAGGATTGCACTGAGAGACAGGCTAACTTT 

CTGGGAAAGGATGCCAGACTGGTTATCAAAACCTATTGGGGACTGCATACCGGTGAGAGAGACTGGCACCTCGGCC 

ATGGCGTCAGCATTGAGTGGAGGATAAGGGAAAGGGCTGAGGATAGCGGCAACGAAAGCGAAGGCGACACAGAAGA 

GCTCAGCACATTGGTGGACATGGGCAATTACGATCTGTCTAGCCCTGCCCCCAGGGGACCCGATAGGCTGGAGAGA 

ATCGAAGAGGAAGGCGGAGAGCAAGGCAGAGGCAGAAGCGTCAGGCTCGTGAATGGCAGAGAGGTCGAGGAAGTCA 

ATGAGGGAGAGAATAACTGTCTGCTTCACCCTATCAGTCAACATGGCATGGAAGACGAAGAGAGAGAGGTCAATAG 

CGATATCAAAGTGGTCCCCAGAAGGAAAGCCAAAATCATTAGGGATTAGGGAAAGCAAATGGCTGGCGATGACTGT 

GTGGCCAGCTTCTCTTCCGAGCAAACAGGGGCTAACTCCTCTACAAGCAGAAAGCTGGGAGACGGAGGCGGAGCCG 

ACAGACAGGGAACAAGCTCCAGCTGTTTCAATTGCGGCAAAGAGGGACACATTGCCAAAAACTGTAGGGCCCCTCG 

CAAGAAAGGTTGTTGGAAATGCGGAAAGGAAGGCCATCAAATGAAAGACTGTACCGAAAGGCAAGCCAATTTCCTC 

GGCAAAATCTGGCCCTCCAACAAAGGCAGACCGGGAAACTTTCTCCAAAGCAAATGGCTCTGGTATATCAAAATCT 

TTATCATGATCGTCGGTGGACTGATTGGCCTCAGGATTATCTTTGCCGTCCTGTCCATCGTTAACGGAGCCGTGAG 

CCGAGACCTCGATAAACATGGCGCTATTACAAGCTCCAATACCGCTGCCAATAACGCTGACTGTGTCTGGCTGAAG 

GCTGCTGCCATGACACCCCTGGAGATCATCGCTATCGTCGCCTTTATCGTCGCGCTCATCATAGCCATTGTGGTCT 

GG AC AAT CG TCT A C ATTG AGT ATGTCG AC tgaagatctgaattc 
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A2 fragment 

ggatccac c ATG ACAGGCCCTTGC ACAAACGTCAGCTCCGTGCAATGCAC ACACGGAATCAAACCCGTCGTGTCCA 
CCCAACTGCTCCTGAATGGCTCCCTGAAAAGCCTCTACAATACCGTCGCCACACTGTGGTGTGTCCACCT^AAGGAT 
TGAGGTCAAGGACACAAAGGAAGCCCTCGACAAAATCGAACTCGGCGATGGCGGAGGCGCTGAAAGGCAAGGCACC 
TCCAGCTCCATCAACTTTCCACAAATCACACTGTGGCAAAGGCCTCTGGTCACCGAACCCTTCAGAAAAGAGAATC 
CCGAAATGGTGATTTACCAGTACATGGACGATCTGTATGTGGGAAGC 

ACCCGATAAGAAACACCAAAAGGAACCACdATTCCTCTGGATGGGATACGAACTGCATCCCGATAGGTGGACCGTC 
CAGCCTTTTAATTTCCCTCAGATTACCCTCTGGCAGCGTCCCCTCGTGACAATCAAAATCGGCGGACAGCTCATAG 
AGGCTCTGCTCGACACAGGCTCCTATGGCAGAAAGAAACGTAGGCAACGTAGACGCGCTCCTCAGAGCAGAAAGGA 
TCACCAATACCCTATCTCTGAGCAACCCCTCTCCTTCTTTAGGGAAAACCTGGCTTTCCAGCAAGGTAAAGCCAGA 
G AGTTTTC C AG CG AAC AG AC AGG AG CC AAT AG CTCCG CCTC C AGG AAG AG CCC CC AAATCTCCGG CG AAAGCTCCG 
TCATTCTGGGATCTGGCACCAAAAACGCCGCTACTAGAAGAATCGATGTGAGAGATACCAAAGAGGCTCTGGATAA 
G ATTG AGG AGG AGCAAAAC AAAAG C AAG CAAAAGACAC AAC AGG CTG C CG CTAAAG C CGG ATACGTC ACCGATAGG 
GGAAGGCAAAAGATTATCTCCCTGACAGAGACAACCAATCAGAAAACCGAACTGCATGCCATTCAAGAAGCCGATA 
/ CCACACTGTTTTGCGCCAGCGATGCCAAAGCCTATGACACAGAGGTCCACAATGTGTGGGCCACACACGCTTGCGT 
CCCCGCTGACGATACAGTGCTGGAGGAGATGAACCTCCCCGGAAAATGGAAGCCTAAGATGATTGGCGGAATCGGC 
GG ATTCATTAAGGTGAGAAAGATCGG AC CCGAAAAC CCTTACAATACCCCAATCTTCGCTATC AAG AAAAAGAACT 
CCACCAAATGGAGAAAGCTCGTGGATTTCAGAATTAGGATTATCAAAATCCTCTACCAAAGCAATCCCTATCCTAG 
CTCCGAAGGCACCAGGCAAACCAGAAAGAATAGGAGAAGGGGATGGGGAGGCGAACAGGGTAGGGATAGGTCCGTG 
AG AC TGG TC AACGG ATTCTTAG CCCTC G CCTGGG A CG ATCTG AG AAG C CTCTG CCTCTTCG ACAACCTCTGGG TCA 
CCGTCTACTATGGCGTCCCCGTCTGGAGAGAGGCTAACACAACCCTCTTCTGTGCCTCCGACGCTAAGGCTTACGC 
TGCCATGGCTGGCAGCAGCGGCAGCACAGACGAAGAGCTCCTGAAGGCTGTCAGAATCATTAAGATTCTGTATCAG 
TCCAACCCTTACCCTTCCGCTAGTATGAAAATCAGAACCTGGAAGAGCCTGGTCT^AGCATCACATGTACATCTCCA 
AGAAAGCCAATGGCTGGTTCTATAGGCATCACTTTGAGGAGTCCGAGGTCGTGAATCAGATTATCGAAAAGCTTAT 
CAAAAAGGAAAAGGTCTACCTATCATGGGTACCAGCCCACAAGGGAATCGGACGAACCAAAGAGCTCCAGAAACAG 
ATTATCAAAATCCAAAACTTTAGGGTCTACTATAGGGATAGCAGAGACCCTATCTGGAAGGGACCCAAAAGCCTTG 
AGGAAATCTGG/VACAATATGACATGGATTCAGTGGGAGAGAGAGATTAGCAATTACACT^AACCTAATCTATAAGAT 
TCTGAGACCCGAACCCACAGCCCCTCCCGCTGAGAATTTCGGATTCGGTGAGGAAACTACACCCTCCCAAAAGCAA 
GAGCCAAAGGATAAGGAGCAATACGATCAGATTATTATTGAGATTTGCGGCAAGAAAGCTATTGGTACAGTGCTCG 
TGGGACCTACCCCTGTGAATATCATTGGCAGAATTTACGAAACCTATGGCGATACCTGGGAGGGCGTCGAGGCTCT 
GATCAGAATCCTCCAGCAACTGATGTTTATCCATTTCAGAATCGGATGTTTTCATTGCCAAGTGTGTTTTCTCACC 
AAAGGTCTCGGCATTAGCCACGGAAGGAAAAAGAGAAAACAGAGAAGGCGAGCTCCCCAAGCTGCCATGGACCCCG 
TGGACCCCAACCTGGAGCCTTGGAAACACCCTGGCTCCCAGCCTAAGACAGCCTGTAACAAATGCTATTGCAAAAA 
GTGCCCTAGCGAAGAGACAACCCCTAGCCAGAAACAGGAACAGAAAGACAAAGAACTCTACCCCCCTTTAGCCAGC 
CTC AAGTC C CTG TTTGG C AATG AC AATTTCAAT ATGTGG AAG AAT AAC ATGG TGG AACAG ATG C AAG AAG ACATT A 
TCTCACTATGGGACCAAAGCCTCAAGCCTTGCGTCAAGCTCGACGTCGGCGATGCCTATTTCTCCGTGCCTCTGGA 
TAAAAACTTCAGAAAGTATACCGCTTTCACAATCCCTAGCACAAACAATGAGCAACTGAAAGGCGAAGCCATGCAT 
GGCCAAGTGAATTGCTCACCAGGCATTTGGCAACTGGATTGCACACACCTGGAGGGAAAGATTATCCCTAAGGTCA 
AGC AATGG CCTC AG AC AG AGGAAAAGATTAAGG ACTG AGATTTG CACAG AG ATGG AG CAAG AGGG AAAGAT 

TAGCATGGATGACCTCTACGTCGGCTCCGACCTGGAGATTGGCCAACATAGGACCAAAATCGAAGAGCTCAGGGCA 
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CACCTCCTGAGATGGGGACTCACCGACACCACAAACCAAAAGACTGAGCTCCACGCTATCCATCTGGCTCTGCAAG 

ACTCCGGCTTAGAGGTCAACATTGTGACAGACATTCCCGCTGAGACTGGTCAAGAGACCACCTATTTCATTCTGAA 

ACTGGCTGGCAGATGGCCTGTGAGAATCATTCACACAGACAATGGCAGGACAAAGATTGAGGAACTGAGACCGCAT 

CTGCTCAAATGGGGCTTCACAACCCCTGACAAAAAGCGTCAGAAAGAGCCTCCCTTTCTGTCTAGTGTCAAGAAAC 

TGACAGAGGATAAGTGG7VACAAACCCCAGAAAATCAAGGGACACAGAGAAAATCACACAATGAATGGCCATGCTGC 

CACAGAGTCCCAGAATCAGCAAGACAGAAACGAAAAGGAACTGCTGGAGCTCGACAAATGGGCAAGCCTCTGGAAT 

TGGTTTAACATTACCGACACCGGAAGTAGCTCCCAAGTGTCCCAGAATTACCCTATCGTCCAGAATCTCCAAGGCC 

AAATGGTCCACCAACCCATCTCCCCCAGACTCGTCGGACTGAGAATCATTTTCGCTGTGCTCAGCATTATCAATAG 

GGTCAGGCAAGGCTATAGCCCTCTGTCCTTCCAAACCCTCACCCTCATCCATCTGTATTACTTTGACTGTTTCGCT 

GACTCCACCATTAGGAGAGCCATCCTTGGACACAGAGTGAGCAGGAGATGCGAATAGGCTGTGGGAATCGGAGCCA 

TGTTCCTTGGCTTTCTGGGTGCCGCTGGCTCCACCATGGGCGCTGCCTCCATCACACTGACAGTGCAAGCCTATGA 

CCCTAGCAAAGACCTCATTGCTGAGATTCAGAAACAGGGTCAGGATCAGTGGACATATCAGATTTTCCAAGAGCCT 

TTCAAAAACGGAACCGTCCTGGTCGGCCCTACACCCGTCAACATCATCGGAAGGAACCTGCTGACACAGATAGGCT 

GCACCCTCAACTTTCCCATTAGCAAAGGCAGCCCTGCTATCTTTCAGTCCAGCATGACACAGATTCTGGAGCCTTT 

TAGGAAACAAAACCCTGACATGGTCATCTATCAGTATCCTAGCCCTCTGACATTCGGATGGTGTTTCAAACTGGTC 

CCCGTGGACCCCAGCGAAGTGGAAGAGACCAACAAGGGCGAAAACAATTGCCTCCTGTTTAGGAAATACACAGCCT 

TTACCATTCCCTCCACCAATAACGAAACCCCTGGCATTAGGTATCAGTATAACGTCCTGCCTCAGGGATGGGGAAG 

CACAATGGGAGCCGCCAGCATGACCCTCACCGTCCAGGCTAGGCAACTGCTCAGCGGAATCGTCCAGCAACAGAAC 

AATCTGCTGGAGGAGAATAGGGAAATCCTCAAAGAGCCTGTGCATGGCGTCTACTACGATCCCTCCAAGGATCTGA 

TCG C TG AAATCCAAAAGCAAGGGACAG AGG AACTGTCCG CCTTGGTGG AT ATGGGAAACTACCACCTCGG AGTGG A 

CAATAACCTCGCCGCTATTAGAATCCTGCAACAGCTCATGTTCATTCACTTTAGGATTGGCTGCCAGCACTCCAGG 

ATTGGCATCATCCGTCAGAGAAGGGCCAGAGCTCCCAGGAAAAAGGGATGCTGGAAGTGTGGCAAAGAGGGACACC 

AGATGAAGGATTGCACTGAGAGACAGGCTAACTTTCTGGGAAAGGATGCCAGACTGGTTATCAAAACCTATTGGGG 

ACTGCATACCGGTGAQAGAGACTGGCACCTCGGCCATGGCGTCAGCATTGAGTGGAGGACAAGGGAAAGGGCTGAG 

GATAGGGGCAACGAAAGCGAAGGCGACAGAGAAGAGCTCAGCACAATGGTGGACATGGGCAATTAGGATCTGTCTA 

G C CCTG CCCCC AGGGGACC CG ATAGGCTGG AG AG AATCG AAG AGG AAGGCGG AG AGCAAG AC AGAG ACAGAAG CGT 

CAGGCTCGTGAATGGCAGTGAGGGCGAGGAAGTCAATAAGGGAGAGAATAACTGTCTGCTCCACCCTATGAGTCAA 

CATGGCATGGAAGACGAAGACAGAGAGGTCAATAGCGATATCAAAGTGGTCCCCAGAAGGAAAGCCAAAATCATTA 

GGGATTACGG AAAG CAAATGGCTG AOG ATG ACTG TGTGGCCGG CTTC TCTTCCGAG C AAAC AAGGGCT AACTCCCC 

TGCAAGCAGAAAGCTGGGAGACGGAGGCGGAGCCGACAGACAGGGAACAAGCTCCAGCTGTTTCAATTGCGGCAAA 

GAGGGACACATTGCCAAAAGCTGTAGGGCCCCTCGCAAGAAAGGTTGTTGGAAATGC<5GAAGGGAAGGCCATCAAA 

TGAAAGACTGTAGCGAAAGGCMGCCAATTTCCTCGGCAAAATCTGGCCCTCCAAAAAAGGCAGACCCGGAAACTT 

TCTCCAAAGCAAATGGCTCTGGTATATCAAAATCTTTATCATGATCGTCGGTGGACTGATTGGCCTCAGGATTATC 

TTTGCCGTCCTGTCCATCATTAACGGGGCCGTGAGCCGAGACCTCGATAAACATGGCGCTATTACAAGCTCCAATA 

CCGCTGCCAATAACCCTGACTGTGTCTGGCTGGAGGCTGCTGCCATGACACCCCTGGAGATCATCGCTATCGTGGC 

CCTTATCGTCGCCCTCATCATAGCCATTGTGGTCTGGACAATCGTCTACATTG AGTATGTCG AC t g aagatctgaa 

ttc 
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ggatccac c ATGCTCGAG AATATGCTCACCCAAATCGGATG CACACTGAATTTCCCTATCTCCCCCATTGAGACAG 

TGCCTGTGAAACTGAAACCCGGAATGGATGGCGCCGCCACCTTTAGGCCTGGCGGAGGCAATATCAAAGACAATTG 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTCTGGGAATCACATGGATTCCCGAATGGGAGTTC 

GTCAACACACCCCCACTGGTCAAGCTATGGTATCAGCTGGAGAAAGACCCTATCGTTGGCGTTGAGCCTCAGGATC 

TCAACACGATGCTGAATCTTGTAGGAGGCCATCAGGCCGCTATGCAAATGCTGAAAGAGACAATCAATGAGGAAGC 

CTCTGTCCTGTTTCTGGATGGCATTGACAAAGCTCAAGAGGAACATGAAAAGTATCACTCCAACTGGAGGACAATG 

GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGGCCTCTAGGGAGCTGGAGAGATTCGCTCTGAATCCCAGCC 

TGCTGGAGACATCCGAAGGCTGTCAGCAAATTGCTGAGGAAGAGATTATCATTAGGTCCGAGAATTTCACAAACAA 

TGTCAAAACCATTATCGTCCAACTCAACGAAAGCGTCGAGATTAACATGGGCGCTAGGGCTAGTGTCCTCAGAGGC 

GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTCAGGCCTGGCGGAAAGAAAAAGTATAGGCTCAAGGAGAAGGGAG 

GCCTGGAGGGACTGGTTTACTCCAAAAAGAGGCAAGACATTCTGGATCTGTGGGTGTATAACACACAGGGATTCAC 

TAGATGGGGAACCATGATCCTCGGCTTGGTGATTATCTGTAGCGCCAGCGAGAATCTGTGGGTGACAGTGTATTAC 

GGAGTGCCTGTGTGGAGGAGACAGCTCCTGTCCGGCATTGTGCAACAACAAAATAACCTCCTGAGGGCTATCGAAG 

CCCAACAGCATCTGCTCCAGCTCACCGTCTGGGTCAGGCATTTCCCCAGGCCTTGGCTCCACGGCCTGGGACAGTA 

CATCTATGAGACATACGGAGACACATGGGCGGGAGTGGAAGCCCTCACAGCCCTCATCACACCC7VAAAAGATTAGG 

CCTCCCCTCCCATCCGTGAAAAAGCTCACCGAAGACAGATGGAATGAGCCTCAAAAGACATATAGCGCTGGCGAAA 

GGATTATCGATATCATTGGATCCGACATTCAGACTAAGGAACTGCAAAAGCAAATCCTAAAGATTCAGAATTTCGC 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGCATTGGCGGCTACTCCGCCGGAGAGAGAATCATTGACATTATC 

GCCACCGATATCATTCCCGTGGGCGAAATCTATAAGAGATGGATCATTCTGGGACTCAACAAAATCGTGAGAATGT 

ATCTACCCGTCAGCATTCTGGATATCAGAGTGAGACAGGGATACTCCCCCCTCAGCTTTCAGACACTGCTGCCCGC 

TCCCAGAGGCCCTGACAGACTCGGAGGCATTGAGGAAGAGTCCAGCCAGGACCATCAGTATCCCATTCCCGAACAG 

CCTCTGCCTCAGACAAGGGGAGACAATCCCACAGACCCTAAGGAAAGCAAAAAGGCTAGTGGAGGGGTCGAGTCCA 

TGAATAAGGAACTGAAAAAGATTATCGGACAGGTCAGGGACCAGGCTGAGCACCTGAT^AACCGCTGTGCAAATGGC 

TGCCATGCAGATGCTCAAGGATACCATTAACGAAGAGGCTGCCGAGTGGGACAGAGTCCATCCCGTCCATGCCGGG 

C C CGTT C C CC CTCTCACCG AG ATTTGTAAAG AAATGG AAAAAG AAGG C AAAATCTC C AAG ATTGG CC CTGAG AATC 

CCTATAACACACCCATCTTTGCCATTCAAGTGAGAGAGCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGT 

CTTCATTCACAATTT C AAAAGG AG AGGCGGAATCGGAGGC AAAAAG AAAGATAGCACAAAGTGGAGGAAACTGGTA 

G ACTTT AG GG AG CTC AAC AAA CGT ACAC AGG ATTTCTGGG AGG TC C AG CTCGG CTTTTTGG CTCTGGCTTGGG ATG 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATCACAGACTGAGAGACTTTATCCTCATCGTTGCCAGAATCTGCCGACA 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACGGCGCCTCCAGTTCCGCTGCCCCCAAAATCTCCTTC 

GACCCCATTCCCATTCACTATTGCGCTCCCGCTGGCTTCGCTATCCTCAAGTGTAACGATAAGAACTTCAATGGCG 

AAGAGGATTGGCATCTGGGACAGGGAGTGTCCATCGAATGGAGACAGAAAAGCTATAGCACACAGGTGGACCCTGA 

CCTCGCCGATCAGCCTAGCCTCTATCCTCCCTTAGCTTCCCTGAAAAGCCTCTTCGGAAACGATCCCTTATCCCAA 

GCCGCTAGAAGGGCTATCCTCGGCCATATAGTCAGGAGAAGGTGTGAGTATCAGTCCGGACACAATAAGGTCGGCT 

CCCTGCAATACCTCGCACTCAGTCAACCCACAACCGCTTGCTACAAGTGTTACTGTAAGAAATGTTGCTTCCACTG 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAATCAGGGATTACGGAAAGCAAATGGCTGGCGATGACTGTGTGGCC 

AGCAGGCAAGACGAAGACGCAGCCAAGTACCATAGCAATTGGAGAACCATTGGCAATGAGTTTAACCTCCCCCCTA 

TCGTCCCTAAGGAAATCGTCGCAAATTGCAATAAGTGTAACGAATGGACACTGGAACTGCTGGAGGAACTGAAACA 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATGGCCTCGGTCAACACGATATCATTAGCCTCTGGGATCAG 
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TCCCTGAAACCCTGTGTGAAACTGACACCCCTCTGCGTCACCCTCAACTGTACCAATGCCAATCTGATGAAGAGAT 

ACTCCACCCAAGTGGACCCCGATCTGGCTGACCAACTGATTCACCTCCACTATTTCGATTGCTTTGCCGATAGCGC 

AATCCATCCCATCGGCCAACACGGAATGGAGGATGAGGATAGGGAAGTGCTGAAATGGAAATTCGATAGCCATCTG 

GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACGGTCCCCGTCAAGCTCT^AGCCTGGCATGGACGGACCCAAAG 

TGAAACACTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTGGCCTAGCAACAAGGGAAGGCCTGGCAATTTCCC 

GCAGTCCAGGCCTGAGCCTACCGCACCCCCAGCCGAGAGCTTTAGATTCGGCATTAGCAAAAAGGCTAAGGGATGG 

TTTTACAGACACCATTACGATAGCCGACACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCGGCATGATGACCG 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGTACTGGCAGAGGCTATATCCCAGGTGAACAACGCTAA 

CATTCCTCCCATTGTGGCCAAAGAGATTGTGGCAAACTGTGACAAATGCCAGCTCAAGAGTGAGGCTATTCACGGA 

CAGGTGAACTGTAGCCCTTCCGAGGGAACAAGACAGACTAGGAAGAACAGACGTAGAAGGTGGCGTGCGAGGCAAA 

GGCAAATCCACTCCATCTCCGAGAGGATTCTGGGACAGATGAGGGAACCCAGAGGCTCCGACATTGCGGGTACTAC 

AAG C ACACTGC AAG AGC AAATCGCATGG ATG ACAAG C AATC G CCCTAG C ATTC AACAAGAG TTTGG CATTCCCTAT 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTAAAGAAAATCATTGGCAGACAGGAGATCCTCG 

ATCTCTGGGTCTACCATACCCAAGGCTATTTCCCTGACTGGCAGAATTACACACCCGGACCCGGAGTCAGATACCC 

TAGCAGAGAAAGACAGAGACAGATTCATTCTATTAACGAATGGATTCTCAGCAACTGCCTCGGCAGATCCGCTGAG 

CCTGTGCCTCTGCAACTGTATAAGACACTGAGAGCCGAACAGGCTACCCAAGAGGTCAAGAATTGGATGACCGAGA 

CACTGCTCGTGCAAAACGCTAACCCTGACTGTGAGAGAGTGTATCTGGCTTGGGTCCCCGCTCATAAAGGCATTGG 

CGGAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCATTAGGAAAACAGACCCTAACCCTCAGGAAATCCATCTG 

GAAAACGTCACCGAGAACTTTAACATGTGGAAAAACGATATGGTGGAGCAAATGCATGAGGCTGOCTATGCCATTC 

TGAAATGCAATAACAAAAGGTTCAACGGAACTGGACCCAGTAAGAATGTGTCCACCGTCCAGTGTACCCATGGCCT 

AGAGCTCAAGAATAGCGCTATCTCCCTGCTCAACGCTACCGCTATCGCTGTGGCTGGGTGGACCGATAGGGTTATC 

GAAGTGGTTCAGTCCCGGCATCCCAAAGTGTCCAGCGAAGTGCATATCCCTCTGGGAGACGCTAGGCTCATCATTA 

GGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTAAATGGTCCAAGTGCTCCCTCGTCGGATGGCCGGC 

AGTGAGAGAGAGAATCAGACAGACACCCCCTGCCGCTGAGGGAGTGCTCAAGACCGGCAAGTACTCTAGGAAGAGG 

GGTGCCCATACCAATGACGTCAAGCAACTGACAGAGGCTGTGCAAAAGATTGCCACAGAGTCTAGCTGGGAGGGTC 

TGAAATACTGGGGGAATCTGCTCCAGTACTGGGGCCAGGAACTGAAAATCTCCGCCGTCAGCCTCCTGAATGCCAC 

AGCCATTGAGCTGCCTGAGAAAGAAAGCTGGACCGTCAACGATATCCAAAAGCTCGTGGGAAAGCTCAACTGGGCA 

TCCCAGATTTACCCCGGAAGAGCCATTGAGGCTCAGCAACACATGCTGCAACTGACAGTGTGGGGCATTAAGCAAC 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCTCGCCCTCCAGGATAGCGGATTGGAAGTGAATATGGTCACCGA 

TAGCCAATACGCTCTAGGCATCATTCAGGCTCAGCCTGACAAAAGCGAAAGGGAAATCTCCAACTATACCAATCAG 

ATTTACAAGATCCTCACCGAATCTCAAAATCAACAGGATAGGAATGAGAAAGACCTCCTGGCTCCCACAAAGGCTA 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGGCGTCGGCATTGGCGCTATGTTTCTCGGATTCCTCGGCGCTGCCAA 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTATCAAAGTCAGGCAGTATGACCAAATCCTTATCGAAATCTGTGGA 

AACAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCTGATCGTCGCTAGGATTGTGGAACTGCTCGGCCGTA 

GCTCCCTGAAAGGCCTCCAGAGAGGCACACTGAATGCCTGGGTGAAAGTGATTGAGGAAAAGGGATTCAGTCCGGA 

AGTG ATTCCCATGTTTTCCGCTCTGTCCG AGGG AGCCACACTCG AG tgaagatctgaattc 
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B2 fragment 

ggatccaccATGCTCGAGAATATGCTCACCCAAATCGGATGCACACTGAATTTCCCTATCTCCCCCATTGACACAG 

TGCCTGTGAAACTGAAACCCGGAATGGATGGCGCCGCCATCTTTAGGCCTGGCGGAGGCAATATGAAAGACAATTG 

GAGAAGCGAACTGTATAAGTATAAGGTCGTGAAGATTAAGCCTCTGGGAATCACATGGATTCCCGAATGGGAGTTC 

GTCAACACACCCCCACTGGTCAAGCTATGGTATCAGCTGGAGAAAGAGCCTATCGTTGGCGCTGAGCCTCAGGATC 

TCAACACGATGCCGAATACTGTAGGAGGCCATCAGGCTGCTATGCAAATGCTGAAAGACACAATCAATGAGGAAGC 

CGCTGTCCTGTTTCTGGATGGCATTAACA^AGCTCAAGAGGAACATGAGAAGTATCACTCCAACTGGAGGACAATG 

GCCAACGACTTTAATCTGATGAAGCATCTCGTCTGGGCCTCTAGGGAGCTGGAGAGATTCGCTCTGAATCCCGGCC 

TGCTGGAGACATCCGAAGGCTGTT^GCAAATTGCTGAGGAAGAGATTATCATTAGGTCCGAGAATTTCACAAACAA 

TGTCAAAACCATTATCGTCCACCTCAACGAAAGCGTCGAGATTAACATGGGCGCTAGGGCAAGTGTCCTCAGCGGC 

GGCAAGCTGGACGCCTGGGAAAAGATTAGGCTCAGGCCTGGCGGCAAGAAAAAGTATAGGCTCAAGGAGAAGGGAG 

GCCTGGACGGACTGATTTACTCCCAAAAGAGGCAAGACATTCTGGATCTGTGGGTGTATAACACACAGGGATTCAC 

TAGATGGGGAACCTTGATCCTCGGCTTGGTGATTATCTGTAGCGCCAGCGAGAATCTGTGGGTGACAGTGTATTAC 

GGAGTGCCTGTGTGGAGGAGACAGCTCCTGTCCGGCATTGTGCAACAGCAAAATAACCTCCTGAGGGCTATCGAAG 

CCCAACAGCATCTGCTCCAGCTCACCGTCTGGGTCAGGCATTTCCCCAGGCCTTGGCTCCACAGCCTGGGACAGTA 

CATCTATGAGACATACGGAGACACATGGTCGGGAGTGGAAGCCCTCAAAGCCCTCATCAAACCCAAAAAGATTAAG 

CCTCCCCTCCCATCCGTGAAAAAGCTCACCGAAGACAAATGGAATAAGCCTCAAAAGACATATAGCGCTGGCGAAA 

GGATTGTCGATATCATTGCAACCGACATTCAGACTAAGGAACTGCAAAACCAAATCATAAAGATTCAGAATTTCGC 

TGTGTTTATCCATAACTTTAAGAGGAAGGGAGGCATTGGCGGCTACTCCGCCGGAGAGAGAATCATTGACATTATC 

GCCAGCGATATCGTTCCCGTGGGCGATATCTATAAGAGATGGATCATTCTGGGACTCAACAAAATCGTGAGAATGT 

ATTC A CCCGTC AG CATTCTGG AT ATC AG AGTG AG AC AG GG AT A CTC CC C C CTC AG CTTTCAG ACACTG ATG CCCG C 

TCCCAGAGGCCCTGACAGACTCGAACGCATTGAGGAAGAGTCCAGGCAGGACCATCAGTATCCCATTTCCGAACAG 

CCTCTGTCTCAGACAAGGGGAGACAATCCCACAGACCCTAAGGAAAGCAAAAAGGCTAGTGGAGTGGTCGAGTCCA 

TGAATAAGGAACTGAAAAAGATTATCGGACAGGTCAGGGACCAGGCTGAGCACCTGAAAACCGCTGTGCAAATGGC 

TGCCATGCAGATGCTCAAGGATACCATTAACGAAGAGGCTGCCGAGTGGGACAGAATCCATCCCGTCCATGCCGGA 

CCCATTGCCCCTCTCACCGAGATTTGTAAAGAAATGGAAAAAGAAGGCAAAATCTCCAGGATTGGCCCTGAGAATC 

CCTATAACACACCCGTCTTTGCCATTCAAGTGAGAGACCAAGCCGAACACCTCAAGACAGCCGTCCAGATGGCAGT 

CTTCTVTTC^CAATTTCAAAAGGAAAGGCGGAATCGGAGGCAAAAAGAAAGATAGCACAAAGTGGAGGA 

GACTTTAGGGAGCTCAACAAACGTACACAGGATTTCTGGGAGGTCCAGCTCGGCTTTTCGGCTCTGGCTTGGGATG 

ACCTCAGGAGCCTGTGTCTGTTCAGCTATCACAGACTGAGAGACTTTATCCTCATCGTTGCCAGAACCTGCCGACA 

TAGCAGAATCGGCATCACTAGGCAACGTAGAGGTAGGAACGGCTCCTCCAGGTCCGCTGCCCCCAAAATCTCCTTC 

GACCCCATTCCCATTCACTATTGCGCTCCCGCTGGCTTCGCTATCCTCAAGTGTAACAATAAGACATTCAATGGCG 

AAAAGGATTGGCATCTGGGACAGGGAGTGTCCATCGAATGGAGAAAGAAAAGCTATAGCACACAGGTGGACCCTGA 

CCTCGCCGATCAGCCTAGCCTCTATCCTCCCTTAGCTTCCCTGAAAAGCCTCTTCGGAAACGATCCCTCATCCCAA 

GCCGCTAGAAGGGCTATCCTCGGCCAAATAGTCAGGAGAAGGTGTGAGTATCAGTCCGGACACAATAAGGTCGGCT 

CCCTGCAATACCTTGCACTCAGCCAACCCAAAACCGCTTGCTACAAGTGTTACTGTAAGAAATGTTGCTACCACTG 

TCAGGTCTGCTTCCTGAAGAAGGGACTGGGAATCAGGGATTACGGAAAGCAAATCGCTGGCGCTGACTGTGTGGCC 

AG C AGGC AAG ACG AAG ACG C AGC C AAGT ACC ATAGC AATTGG AG AACC ATGG CC AGTG AG TTTAAC CTCCC CC CT A 

TCGTCGCTAAGGAAATCGTCGCAAGTTGTGATAAGTGTAACGAATGGACACTGGAACTGCTGGAGGAACTGAAACA 

TGAAGCCGTGAGACACTTTCCCAGACCCTGGCTGCATGGCCTCGGTCAACACGATATCATTAGCCTCTGGGATCAG 
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TCCCTGAAACCCTGTGTGAAACTGACACCCCTCTGCGTCACCCTCAACTGTACCAATGCCAATCTGCTGAAGAGCT 

ACTCCACCCAAGTGGACCCCGATCTGGCTGACCATCTGATTCACCTCCACTATTTCGATTGCTTTTCCGATAGCGC 

AATCCATCCCATGGGCCTACACGGAATGGAGGATGAGGAAAGGGAAGTGCTGAAATGGAAATTCGATAGCCATCTG 

GCTCTCAGGCATATCGCTTCTAGTCCTATCGATACCGTCCCCGTCAAGCTCAAGCCTGGCATGGACGGACCCAAAG 

TGAAACAGTGGCCCCTCACCGAAGAGAAAATCAAAGCCATTTGGCCTAGCAACAAGGGAGGGCCTGGCAATTTCCT 

GCAGTCCAGGCCTGAGCCTACCGCACCCCCAGCCGAGAACTTTAGATTCGGCATTAGCAAAAAGGCTAAGGGATGG 

TTTTACAGACACCATTACGAAAGCCAACACCCTAAGGTCAGCTCCGAGGTCCACATTCCCCTCAGCATGATGACCG 

CTTGCCAAGGCGTCGGCGGACCCAGTCACAAAGCCAGGGTACTGGCAGAGGCTATGTCCCAGGTGAACAACGCTAA 

CATTCCTCCCATTGTGCCCAAAGAGATTGTGGCAAACTGTGACAAATGCCAGCTCAAGGGTGAGGCTATGCACGGA 

CAGGTGGACTGTAGCCCTTCCGAGGGATCAAGACAGGCTAGGAAGAACAGACGTAGAAGGTGGCGTGAGAGGCAAA 

GGCAAATCCGCGCCATCTCCGAGTGGATTCTGGGACAGATAAGGGAACCCAGAGGCTCCGACATTGCCGGTACCAC 

AAGCACACTGCAAGAGCAAATCGCATGGATGACAAACAATCCCCCTGGCATTAAGCAAGAGTTTGGCATTCCCTAT 

AACCCTCAGTCCCAGGGCGTCGTGGAAAGCATGAACAAAGAGCTCAAGAAAATCATTGGCAGACAGGAGATCCTCG 

ATCTCTGGGTCTACAATACCCAAGGCTTTTTCCCTGACTGGCAGAATTACACACCCGGACCCGGAATCAGATACCC 

TAGCAGAGCAAGACAGAGACAGATTCATGCTATTAGCGAAAGGATTCTCAGCAACTTCCTCGGCAGACCCGCTGAG 

CCTGTGCCTCTGCAACTGTATAAGACACTGAGAGCCGAACAGGCTACCCAAGAGGTCAAGAATTGGATGACCGACA 

CACTGCTCGTGCAAAACGCAAACCCTGACTGTGAGAAAGTGTATCTGGCTTGGGTCCCCGCTCATAAAGGCATTGG 

CGGAAACGAACAGGTGGACAAACTGGTCAGCGCTGGCATTAGGAAAACAGACCCTAACCCTCAGGAAATCGATCTG 

GAAAACGTCACCGAGAACTTTA^CATGTGGAAAAACAATATGGTGGAGCAAATGCAAGAGGCTGGCTATGCCATTC 

TGAAATGCAATAACAAAAAGTTcbvACGGAACTGGACCCTGTAAGAATGTGTCCACCGTCCAGTGTACCCATGGCCT 

AGAGCTCAAGAATAGtGCTGTCTCCCTGCTCAACGCTACCGCTATCGCTGTGGCTGAGTGGACGGATAGGGTTATC 

GAAGTGGTTCAGTCCCAGCATCCCAAAGTGTCCAGCGAAGTGCATATCCCTCTGGGAGACGCTAGGCTCGTCATTA 

AGACATACTGGGGCCTCCACACAGGCGCTGCTATGGGCGGTAAATGGTCCAAGTGCTCCCTCGTCGGATGGGCCGC 

AGTGAGAGAGAGAATCAGACAGACACCCCCTGCCGCTGAGGGAGTGCTCAAGACCGGCAAGTACTCCAGGATGAGG 

AGTGCCCATACCAATGACGTCAAGCAACTGACAGAGGTTGTGCAAAAGATTGCCACAGAGTCTAGCTGGGAGGGTC 

TGAAATACTTGTGGAATCTGCTCCTGTACTGGGGCCTGGAACTGAAAAACTCCGCCGTCAGCCTCCTGAATGCCAC 

AGCCATTGTGCTGCCTGAGAAAGAAGGCTGGACCGTCAACGATATCCAAAAGCTCGTGGGAAAGCTCAACTGGGCA 

TCCCAGATTTACGCCGGAAGAGCCATTGAGGCTCAGCAACACTTGCTGCAACTGACAGTGTGGGGCATTAAGCAAC 

TGCAAGCCAGAGTGCTCGCCATTGAGAGATACCTCGCCCTCCAGGATAGCGGATCGGAAGTGAATATCGTCACCGA 

TAGCCAATACGCTCTAGGCATCATTCAGGCTCAGCCTGACAAAAGCGAAAGGGAAATCTCCAACTATACCAATCAG 

ATTTACAAGATCCTCACCGAATCTCAAAATCAACAGGATAGGAATGAGCAAGAACTCCTGGCTCCCACAAAGGCTA 

AGAGAAGGGTCGTGCAAAGGGAAAAGCGTGCCGTCGGCATTGGCGCTATGTTTTTCGGATTCCTCGGCGCTGCCAA 

ACCCAAAATGATCGGAGGCATTGGAGGCTTTATCAAAGTCAGGCAGTATGACCAAATCCTTATCGAAATCTGTGGA 

CAGAAGGCTATCTCCTACCATAGGCTCAGGGATTTCATTCTGATCGTCGCTAGGATTGTGGAACTGCTCGGCCATA 

GCTCCCTGAGAGGCCTCCGGAGAGGCACACTGAATGCCTGGGTGAAAGTGGTTGAGGAAAAGGGATTCAATCCCGA 

AGTGATTCCCATGTTTACCGCTCTGTCCGAGGGAGCCACACTCXSAGtgaagatctgaattc 
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CI fragment 

ggatccaccATGCTCGAGAGCAACACACCCGCTAATAATGCCGATTGCGCGTGGCTGAAAGCCCAGGAAGAGGAAG 
AAGTGGG ATTTCCTGTG AG AC C C C AAGTGCCT AG AG CTTGG AG GG CTATC CTC AAC ATTC C CAGG AGG ATT AG G CA 
AGGCTTTG AG AG AG CC CTCCT AG CCG CCG AATGGG ACAGGG TTC ACCCTGTG CACG CTG G CCCTGTCG CTCCCGGC 
CAAATGAGAGAGCCCAGAGGAAGCGATATCGCTGGCACAACCCTCAGGCCCATGACATATAAGGCCGCTATTGACC 
TCAGCTTGTTTCTGAAAGAGAAAGGCGGACTGGAAGGCCTCATCTATAGCAAGAAAGCTGCTATGGAACAGGCTCC 
CGAAGACCAAAGCCCTCAGAGAGAGCCTTAtlAATGAGTGGACCCTGGAGCTCCTGGAAGAGCTCAAGAAAGAGGCT 
CAAGGCCAATGGACCTACCAAATCTTTCAGGAACCCTTTAAGAATCTGAAAACCGGAAAGTATTCCAGAATGAGAA 
GCGCTCACACAAACTGGATGACAGAAACCCTCCTGGTCCAGAATGCCAATCCCGATTGCAAGTCCATCCTCAGGGC 
TCTG GG AACCGG AGC CAC ACTGG AAG AG CCTG AGG TC AT C C CT ATGTTC TC AG C CCTCAG CG AAGG CG CTAC CCCC 
CAAGACCTGAATACGATGCTCAACATCGTCAGCGGACACCAATCCACCCTCCAGGAACAGATTGGCTGGATGACAA 
ATAACCCTCCCATCCCTGTCGGAGAGATTTACAAAAGGTGGATTATCCTCGGCCTGACTAGAATCCCCCATCCCGC 
CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGAGACGCTTACTTCAGCGTCCCCCTCGACGAAGAC 
CAAAAGGAAACCTGGG AGG CTTGG TGGACGGAATACTGG CAGG CTAC CTGGATTCCTGAGTGGGAGTTTGTGAATA 
/ CCCCTCCCCTCGTGTTTCCCGATTGGCATAACTATACCCCTGGCCCTGGCATAAGGTATCCCCTCACCTTTGGATG 
GTGCTTTAAGCTCGTGCCTGTGGACCCCAAACTGTGGTACCAACTGGAAAAGGAACCCATTGTCGGAGCCGAAACC 
TTTTACGTGGACGGAGCCGCCAACAGAGAGACAAAGCTCGGCCAAAACGTCCAGGGACAGATGGTGCATCAGGCTA 
TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCGTCGAAGAGAAAGCCTTTAACGAAACCGAAGTGCATAACGT 
CTG G G CTACC C ATGC CTGTG TGCG T AC CG ATCC C AATCC CC AAG AG ATT CTC C TG G AG AATG TG AC AG AGCTCAAG 
GATCAGAAACTCCTCGGCATTTGGGGATGCTCCGGCAAAATCATTTGCACAACCACTGTGCCTTGGAACAGCTCCT 
GGTCCAACCAAGCTGGCCATAACAAAGTGGGAAGCCTCCAGTATCTGGCTCTGACGGCTCTGATTAAGCCTAAGAA 
AATCAAACCCCCTCTGCCTAGCGTTAAGACAATCATTGTGCATCTGAATGAGTCCGTGGAAATCAATTGCACAAGG 
CCTAACAATAACACAAGGAAAGCCGCCGCTAGTGAAGTACGGAATAAGTCCAAACAGAAAACCCAGCT^AGCTGCCG 
CCGATACAGGCGACTCCAGCCAGGTCAGCCAAAACTATCCCATTGTGTCCAACTTTACCTCCACCACTGTGAAAGC 
CGCTTGTTGGTGGGCCAATATCAAACAGGAGTTTGGAATCCCTTACAATCCCCAAAGCCAAACATTCTATGTGGAT 
GGCGCTGCCAATAGGGAAACCCAACTGGGAAAGGCGGGCTATGTGACAGACAAAGGCAGACAGAAAGTCATTAGCG 
GAAT CTGGCAG CTCG ACTG TAC CC ATCTGG AAGG C AAAGTCATTCTGGT AG CCGTCC ACG TCGCCTCCGG CT ACAT 
TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTGAGTTCCGGAATCAGAAAGGTGCTATTCCTCGACGGA 
ATCAATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGATTAGGCGAACCGCTCCCGCTGCTGAAGGCGTCGGCG 
CTGTCTCCCAGGATCTGGATAAGTACGGAGCCCTCACCTCCACAAGCGGAACCCAACAGTCCCAGGGAACTGAAAC 
TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTCCAGCGTTGTCCTCGGCTCCGGCTCCATCGTCATCTGGGGTAAA 
ACCCCTAAGTTTAAGTTCCCCATTCAGAAAGAGACATGGGAAGCCTGGTGGACGGAGTATTGGCAAGCCGCTGCTT 
ACAGACTGATCAGCTGTAACACAAGCGTTATCAAACAGGCTTGCCCTAAGATTACCTTTGACCCTATCCCTATCCA 
TTACTGTGCCCCTCCTAGCTGGATGGGCTATGAGCTCCACCCTGACAGATGGACAGTGCAACCCATCGTGCTCCCC 
GAAAAGGACTCCTGGACAGTGAATGACATTCAGAAATCAATTCTGAGAGCCCTCGGCCCAGGCGCTTCCCTGGAGG 
AAATGATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCTAGAGTGTATTACAGAGACTCCAGGGACCC 
CATTTGGAAAGGCCCTGCCAAACTGCTCTGGAAAGGCGAAGGCGCTGTGGTCATCCAAGACATTAAGATTGGAGGC 
C AA CTG AT AG AAG CCCTC C TGG AT AC AGG AGCCG ATG A C A CCGTC CTGG AAG AT ATG AAT C TG C CTG G CAAG TGGG 
GAATCAAACAGCTCCAGGCTAGGGTCCTGGCTATCGAGAGGTATCTGAAAGATCAACAGTTTCTGGGACTCTGGGG 
CTGTAGCGGAAAGGCTGCTATGGAAAACAGATGGCAAGTGATGATCGTCTGGCAAGTGGACAGGATGAAGATTAGG 
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ACATGGAATAGCCTCGTOAAACACCATATGTATATTATCTGTACCACAACCGTCCCCTGGAACTGCACCTGGAGCA 

ATAAGTCCTTCGAAGAGATTTGGAATAACATGACCTGGATTCAATGGCTGATTCTCGCTATCGTCGTGTGGACCAT 

TGTGTATATCGAATACAAGAAACTGCTCAGGCAAAGGAGAATCGATAGGCTCATCAAAAGGCTCAACCCTGGCCTC 

CTGGAAACCGCTGAGGGATGTAAACAGATCCTGGAACAGCTCCAGCCCGCCCTCCAGACAGGCACCGAAGAGCTCT 

CTAGTAGAAAGCTCCTGAAACAGAGAAAGATTGACAGACTGATTGAGAGAATCAGAGAGAGAGCCGAAGACTCGGG 

CAATGAGTCCGAGGGAGACACACCCGGAATCAGATACCAATACAATGTGCTCCCCCAAGGCTGGAAGGGCTCCCCA 

CCCATTTTCCAAAGCTCCATGACCCAAATCCTCATGATGCAAAGGGGAAACTTTAAGGGACAGAAAAGGATTATCA 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCTCGCTAGGAATTGCAGACCTCCCCTAGAGAGACTGAACCTGGATTG 

CTCCGAGGATAGCGACACCTCCGGCACACAGCAAAGCCAAGGCACAGAGACAGAAGTGGGACTCGTGGCTGTGCAT 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCCTGCCGAAACTGGACAGGAAACCGCTTACTTTATCCTCAAGA 

TTAAGCCTGTGGTCAGCACACAGCTCCTGCTCAACGGTAGCCTCGCTGAAGAGGAAATCATTATCAGAAGCGAAAA 

CTTTACCGATAACAAACTGGTCGGCAAACTGAATTGGGCTTCCCAAATCTACGCTGGCATCAAAGTGAAGCAACTG 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACTCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCA 

ATGTGAATGCTGCTCAAACCAGAGGCGATAACCCTACCGGTCCCGAAGAGTCCAAGAAAGAGGTCGCGTCCAAGAC 

AGAGACAGACCCTTGTGACGCCGCCCCTAGCTCCAACTTTCTGGGAAGGTCTGCCGAACCCGTCCCCCTCCAGCCC 

CCCCCTCTGGAAAGGCTCCACCTCGACTGTAGCGAAGACTGTGGCGAACTGGATAAGTGGGCCTCCCTGTGGAACT 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTTTCATTATGATTGTGGGAGGCAATAAGATTGTCAGGAT 

GTACTCACCTGTCTCCATGCTCGACATTAAGCAAGGCCCTAAGGAACCCTTCAGGGATTACGTGGACAGATTCGCT 

AAGCTCCTGTGGAAGGGAGAGGGAGCCGTCGTGATTCAGGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGG 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGTGAAGTGCAACTGGGAATCCCTCACCCTGCTGGACT 

GAAGAAGAAAAAGTCAGTGACAGTGGCCGCTATGAGAGTGAAAGAGACACAGATGAACTGGCOCAATCTGTGGAAG 

TGGGGCACAATGATTCTGGGACTGGTCATCATTTGCTCCGCCTCCATTAAGGTCAGACAGCTCTGCAAACTGCTCA 

GGGGTACAAAGGCTCTGACAGAGATTGTGACACTGACAGAGGAAGCCGAACTGGAACTGCTCATATGGAAGTTTGA 

CTCCCGCCTCGCCCTGAGACATATCGCCAGGGAACTGCATCCCGAGTTCTACAAAGACTGCGCTGCTGTCGAGCTC 

CTGGGACGCTCCAGCCTCAAGGGACTGCAAAGGGGATGGGAAGGCCTCAAGTATTTGTGGAACCTCCTGCAGTATT 

GGGGCTCTAGCCTGGGGCAACTGCAACCTGCTCTGAAAACCGGATCAGAGGAACTGAAGTCCCTGTATAACACAAT 

CGCTACCCTCTGGTGTGTGCATCAGGAGCTCTACAAATACAAAGTGGTCAAAATCAAACCCCTCGGCATTGCGCCT 

ACCAGAGCCAAAAGGAGAGTGGTCGAGAGAGAGAAAAGGCTCACCGAAATCGTCCCACTCACCGAAGAGGCTGAGC 

TGGAGCTGGAGGAAAACAGAGAGATTCTGAGGGAACCCGTCCACGGAGTGTATAGAGTGCTCGCCGAAGCCATGAG 

CC AAGTC AACAATGCC AAC ATC ATG ATG C AG AG AGG CAATTTCAAAGGCCT AAAG AG AATC ATCAAACAAG AGG AA 

G AGG AGGTGGG CTTCC C CG TC AGG C CC C AG GTC C C ACTG AG ACCT ATG AC C T AC AAAGG AG CGG TCG ATCTG TCCT 

TCTTC AG AC AGGG AC C C AAAGAGCCTTT C AG AG ACT ATGTGG AT AGGTTTTTCAAAACCCTC AGGG CTG AGCAAGC 

CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGAGACCTGGTGGCAAAAAGAAATACAAAATGAAACACATT 

GTGTGGGCCTCCAGGGAACTGGAAAGGTTTGCCTCCCAGTATGCCCTCGGCATCATCGTAGCCCAACCCGATAAGT 

CCGAGTCCGAGCTCGTGAATCAGATTATCGAAGAGCTCATCAAGAAGATTGCCGTCGCCGGATGGACAGACAGAAT 

CATTGAGGTCGACCAAAGGGCTTGGAGAGCCATTCTGAATATCCCCAGGAGAATCAGACAGACTAGACTCGCCGGA 

AGGTGGCCCGTCAGGACAATCTATACCGATAACGGAAGCAATTTCACAAGCGCTACCGTCAAGGCTGCCTGCTGGT 

GGGCTGATGTGAAACAGCTCACCGCAGTCGTCCAGAAAATCGCTACCGAAAGCATTGTGATATGGGGAAAGACGCC 

CAAGTTCAGACTGCCTATCGCTGCCGCCAGCAACGAGAACATGGAGACCATGGCTGCTtgaaga t ctgaattc 
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ggatCCaccATGCTCGAGAGCAACACAGCCGCTAACAATACCGATTGCGTGTGGCTGAAAGCCCAGGAAGAGGAAG 

AAGTGGGATTTCCTGTGAGACCCCAAGTGCCTAGAGCCGGGAGGGCTATCCTCAACATTCCCACGAGGATTAGGCA 

AGGCCTTGAGAGAGCCCTCCTAGCCGCCGAATGGGATAGGATTCACCCTGTGCACGCTGGCCCTATCGCTCCCGGC 

CAAATGAGAGAGCCCAGGGGAAGCGATATCGCTGGCACAACCCTCAGGCCCATGACATATAAGGCCGCTATTGACC 

TCAGCTTGTTTCTGAAAGAGAAAGGCGGACTGGATGGCCTCATCTATAGCAAGAAAGCTGCTATGGAACAGGCTCC 

CGAAGACCAAAGCTCTCAGAGAGAGCCTTACAATGAGTGGACCCTGGAGCTCCTGGAAGAGCTCAAGCACGAGGCT 

CAAGGCCAATGGACCTTCCAAATCTTTCAGGAACCCTTTAAGAATCTGAAAACCGGAAAGTATGCCAGAATGAGAG 

GCGCTCACACAAACTGGATGACAGATACCCTCCTGGTCCAGAATGCCAATCCCGATTGCAAGTCCATCCTCAAGGC 

TCTGGGACCCGGAGCCTCACTGGAAGAGCCTGAGGTCATCCCTATGTTCTCAGCCCTCAGCGAAGGCGCTACCCCC 

CAAGACCTGAATATGATGCTCAACACCGTCGGCGGACACCAATCCACCCTCCAGGAACAGATTGGCTGGATGACAA 

ATAACCCTCCCATCCCTGTCGGAGAGATTTACAAAAGGTGGATTATCCTCGGCCTGACTAGAATCCCCCATCCCGC 

CGGCCTCAAGAAAAAGAAAAGCGTCACCGTCCTGGATGTGGGAGACGCTTACTTCAGCGTCCCCCTCGACGAAGGC 

CAAAGGGAAACCTGGGAGGCTTGGTGGATGGAATACTGGCAGGCTACCTGGATTCCTGAGGGGGAGTTTGTGAATA 

CCCCTCCCCTCGTGTTTCCCGATTGGCAAAACTATACCCCTGGCCCTGGCACAAGGTATCCCCTCACCTTTGGATG 

GTGCTTTAAGCTCGTGCCTGTGGACCCCAAACTGTGGTACCAACTGGAAAAGGACCCCATTGTCGGAGTCGAAACC 

TTTT ACG CGG ACGG AG CCGCCAAC AG AGAG ACAAAG CTCGGC C AAAACGTC CAGGGACAG ATGGTGCATCAG CCT A 

TTAGCCCCAGGACCCTCAACGCTTGGGTCAAGGTCATCGAAGAGAAAGGCTTTAGCGACACCGAAGTGCATAACGT 

CTGGGCTACCCATGCCTGTGTGCCTACCGATCCCAATCCCCAAGAGATTCTCCTGGAGAATGTGACAGAGCTCAAG 

GATCAGAAACTCCTCGGCATTTGGGGATGCTCCGGCAAACTCATTTGCACAACCACTGTGCCTTGGAACAGCTCCT 

GGTCCAACCCAGCTGGCCATAACAAAGTGGGAAGCCTCCAGTATCTGGCTCTGAAGGCTCTGATTACGCCTAAGAA 

AATCAAACCCCCTCTGCCTAGCGTTAAGACAATCATTGTGCATCTGAATGAGTCCGTGGAAATCAATTGCACAAGG 

CCTAACAATAACACAAGGACAGCCGCCGCTAGTGAAGTACAGAATAAGTCCAGACAGAAAACCCAGCAAGCCGCCG 

CCGATACAGGCAGCTCCAGCAAGGTCAGCCAAAACTATCCCATTGTGTCCAACTTTACCTCCACCACTGTGAAAGC 

CGCTTGTTGGTGGGCCAATATCAAACAGGAGTTTGGAATCCCTTACAATCCCCAAAGCCGAACATTCTATGTGGAT 

GGCGCTGCCAATAGGGAAACCAAACTGGGAAAGGCTGGCTATGTGACAGACAGAGGCAGACAGAAAGTCGTTAGCG 

GAATCTGGCAGCTCGACTGTACCCATCTGAAAGGCAAAGTCATTCTGGTAGCCGTCCACGTCGCCTCCGGCTACAT 

TGAGGCTGAGGTCGGCAATGAGCAAGTGGATAAGCTCGTGATTTCCGGAATCAGAAAGGTGCTATTCCTCGACGGA 

ATCGATAAGGCTCAGGAAGAGCACGAAGTCAGGGAAAGGATTAGGCGAGCCGCTCCCGCTGCTGAAGGCGTCGGCG 

CTGTCTCCCAGGATCTGGATAAGTACGGAGCCATCACCTCCACAAGCGGAACCCAACAGTCCCAGGGAACTGAAAC 

TGGCGTCGGCAACCCTCAGATTTTGGGAGAGTCCAGCGCTGTCCTCGGCTCCGGCTCCATCGTCATCTGGGGTAAA 

ACCCCTAAGTTTAAGCTCCCCATTCAGAAAGAGACATGGGAAACCTGGTGGATGGACTATTGGCAAGCCGCTGCTT 

ACAGACTGATCAGCTGTAACACAAGCGTTATCACACAGGCTTGCCCTAAGATTAGCTTTGAGCCTATCCCTATCCA 

TTACTGTGCCCCTCCTAGCTGGATGGGCTATGAGCTCCACCCTGACAGATGGACAGTGCAACCCATCGTGCTCCCC 

GAAAAGGAGTCCTGGACAGTGAATGACATTCAGAAAACAATTCTGAAAGCCCTCGGCCCAGGCGCTACCCTGGAGG 

AAAATATGACAGCATGTCAGGGAGTGGGAGGCCCTGGCCATAAGGCTAGAGTGTATTACAGAGACTCCAGGGACCC 

C ATTTGG AAAGG CCCTG C C AAACTG C TCTGG AAAGG CG AAGG CG CTGTGG TC ATC C AAG AC ATT AAG ATTGG AGGC 

CAACTGAAAGAAGCCCTCCTGGATACAGGAGCCGATGACACCGTCCTGGAAGATATCAATCTGCCTGGCAAGTGGG 

GAATCAAACAGCTCCAGGCTAGGGTCCTGGCTATCGAGAGGTATCTGAAAGATCAACAGCTTCTGGGAATCTGGAG 

CTG TAG CGG AAAGGCTG CTATGG AAAA C AG ATGGC AAGTG ATG AT CGTCTG G C AAGTGG AC AGG ATG AAG ATTAG G 
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ACATGGAATAGCCTCGTGAAACACCATATGTATCTTATCTGTACCACAGCCGTCCGCTGGAACTCCACCTGGAGCA 

ATAAGTCCTTCGAAGAGATTTGGAATAACATGACCTGGATTGAATGGCTGATTATCGCTATCGTCGTGTGGACCAT 

TGTGTTTATCGAATACAAGAAACTGCTCAGGCAAAGGAAAATCGATAGGCTCATCGAAAGGCTCAACCCTGGCCTC 

CTGGAAACCGCTGAGGGATGTAAACAGATCCTGGAACAGCTCCAGCCCGCCCTCAAGGCAGGCACCGAAGAGCTCT 

CTAGTAGAAAGCTCCTGAGACAGAGAAAGATTGACAGACTGATTGAGAGAATCAGAGAGAGAGCCGAAGACTCCGG 

CAATGAGTCCGAGGGAGACACACCCGGAATCAGATACCAATACAATGTGCTCCCCCAAGGCTGGAAGGGCTCCCCA 

GCCATTTTCCAAAGCTCCATGACCAAAATCCTCATGATGCAAAGGGGAAACTTTAAGGGACAGAAAAGGATTATCA 

AGTGCTTCAACTGTGGAAAGGAAGGCCATCTGGCTAGGAATTGCAGACCTCCCCTGGAGAGACTGAACCTGGATTG 

CTCCGAGGATAGCGACACCTCCGGCACACAGCAAAGCCAAGGCACAGAGACAGGAGTGGGACTCGTGGCTGTGCAT 

GTGGCCAGCGGATATATCGAAGCCGAAGTGATCCCTGCCGAAACTGGACAGGAAACCGCTTACTTTCTCCTCAAGA 

TTAAGCCTGTGGTCAGCACACAGCTCCTGCTCAACGGTAGCCTCGCTGAAGAGGAAATCATTATCAGAAGCGAAAA 

CTTTACCAATAACAAACTGGTCGGCAAACTGAATTGGGCTTCCCAAATCTACCCTGGCATCAAAGTGAGGCAACTG 

TGTAAGCTCCTGAGAGGCACCAAAGCCCTCACCCCTCTGTGTGTGACACTGAATTGCACAAACGCTAACCTCATCA 

ATG TG AATG CTG C TCAAC CCAG AGG C G AT AAC C C T ACCG ATCCC AAAG AGTCT AAG AAAGAGGTCG CGTC C AAG G C 

AGAGACAGACCCTTTTGAGGCCGCCCCTAGCTCCACCTTTCTGGGAAGGTCTGTCGAACCCGTCCCCCTCCAGCTC 

CCCCCTCTGGAAAGGCTCCACCTCGACTGTAGCGAAGACAGTGACGAACTGGATAAGTGGGCCTCCCTGTGGAACT 

GGTTCAATATCACCAACTGGCTGTGGTACATTAAGATTTTCATTATGATTGTGGGAGGCAATAAGATTGTCAGGAT 

GTACCAACCTGTCTCCATCCTCGACATTAAGCAAGGCCCTAAGGAACCCTTCAGGGATTACGTGGACAGATTCGCT 

AAGCTCCTGTGGAAGGGAGAGGGAGCCGTCGTGATTCAGGACAACTCCGACATTAAGGTCGTGCCCAGGAGAAAGG 

CTAAGATTATCGAACTGAATAAGAGAACCCAAGACTTTTGGGAAGCGCAACTGGGAATCCCTCACCATGCTGGACT 

GAAAAAGAAAAAGTCCGTGACAGTGGCCGCTATGAGAGTGAAAGAGACACAGATGAACTGGCCCAATCTGTGGAAG 

TGGGGCACAATGATTCTGGGACTGGTCATCATTTGCTCCGCCTCCATTAAGGTCAAACAGCTCTGCAAACTGCTCA 

GGGGTGCAAAGGCTCTGATAGACATTGTGCCACTGACAGAGGAAGCCGAACTGGAACTGCTCATATGGAAGTTTGA 

CTCCCACCTCGCCCTGAGACATATCGCCAGGGAACTGCATCCCGAGTACTACAAAGACTGCGCTGCTGTCGAGCTC 

CTGGGACGCTCCAGCCTCAAGGAACTGCGAAGGGGATGGGAAGCCCTCAAGTATTTGTGGAACCTCCTGCAGTATT 

GGGGCTCTAGCCTGGAGCT^ACTGCAATCTGCTCTGAAAACCGGATCAGAGGAACTGAGGTCCCTGTTTAACACAGT 

CGCTACCCTCTGGTGTGTGCATCAGGAGCTCTACAAATACAAAGTGGTCAAAATCGAACCCCTCGGCATTGCCCCT 

ACCAAAGCCAAAAGGAGAGTGGTCCAGAGAGAGAAAAGGCTCACCGATATCGTCACACTCACCGAAGAGGCTGAGC 

TGGAGCTGGAGGAAAACAGAGAGATTCTGAAGGAACCCGTCCACGGAGTGTATAGAGTGCTCGCCGAAGCCATGAG 

CCAAGCCAACAATGCCAACATCATGATGCAGAGAGGCAATTTCAGAGGCCCAAAGAGAATCATCAAACAAGAGGAA 

GAGGGGGTCGGCTTCCCCGTCAGGCCTCAGGTCCCACTGAGACCTATGACCTACAAAGCAGCCATCGATCTGTCCT 

CTCACAGGAAGTGAAAAACTGGGAGAAAATCAGACTGAGATCTGGTGGCAAAAAGAAATACAAACTGAAACACATT 
GTGTGGGCCTCCAGGGAACTGGAAAGGTTTGCCTCCCAGTATGCCCTCGGCATCATCCTAGCCCAACCCGATAAGT 
CCGAGTCCGAGCTCGTGAGTCAGATTATCGAAGAGCTCATCAAGAAGATTGCGGTCGCCGGATGGACAGACAGAGT 
CATTGAGGTCGTCCAAAGGGCTTGGAGAGCCATTCTGAATATCGCCAGGAGAATCAGACAGACTAGACTCGGGGGA 
AGGTGGGCCGTCAAGATAATCCATACGGATAACGGAAGCAATTTCACAAGCACTGCCGTCAAGGCTGCCTGCTGGT 
GGGCTGATGTGAAACAGCTCACCGAAGTCGTTCAGAAAATCGCTACCGAAAGCATTGTGATATGGGGAAAGACACC 
CAAGTTCAGACAGCCTATCGCTGCCGCCAGCAACGAGAACATGGACGCCATGGCTGCTtgaagatctgaattc 
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